Orges Leka: 2016

Dienstag, 29. November 2016

Tabelle zum Wertverlust pro Jahr bei Gebrauchtwagen

In diesem Post wollen wir den Wertverlust bei Gebrauchtwagen ermitteln. Als Datensatz zur Beantwortung dieser Frage dient uns folgender Datensatz welcher von Ebay Kleinanzeigen stammt und Informationen zu über 370000 Gebrauchtwagen enthält. Um den durchschnittlichen Preis pro Gebrauchtwagen in Abhängigkeit vom Erstzulassungsjahr, Kilometerstand und Leistung in PS zu ermitteln, benutzen wir folgende mathematische Formel (Lineare Regression, Adjusted R-squared: 0.9047) \[ \text{ preis } = \exp( a\cdot \text{ jahr } + b\cdot \text{ km } + c \cdot \text{ ps } + d + \epsilon ) \] wobei die Werte a,b,c,d aus der Regression stammen. Ein durschnittliches Auto fährt (nach dem Datensatz oben) ca. 12000 km pro Jahr. Der Wertverlust nach einem Jahr lässt sich berechnen als \[ w = 1- \frac{p'}{p} = 1 - \frac{ \exp(12000\cdot b)}{\exp(a)} \] wobei p der aktuelle Preis und p' der Preis in € nach einem Jahr ist. Setzen wir die Werte für a und b aus der Regression ein, erhalten wir w = 0.119739, d.h. der Werteverlust eines durchschnittlichen Gebrauchtwagens beträgt ca. 12% pro Jahr. Aus dieser Angabe können wir eine Tabelle erstellen, die uns anzeigt wie sich der Preis eines durchschnittlichen Gebrauchtwagens ungefähr entwickeln wird. Alle Angaben in der Tabelle sind in € und ohne Gewähr!

jetzt	nach 1 Jahr	nach 2 Jahren	nach 3 Jahren	nach 4 Jahren	nach 5 Jahren
1000	880	775	682	600	529
2000	1761	1550	1364	1201	1057
3000	2641	2325	2046	1801	1586
4000	3521	3099	2728	2402	2114
5000	4401	3874	3410	3002	2643
6000	5282	4649	4092	3602	3171
7000	6162	5424	4775	4203	3700
8000	7042	6199	5457	4803	4228
9000	7922	6974	6139	5404	4757
10000	8803	7749	6821	6004	5285
11000	9683	8523	7503	6604	5814
12000	10563	9298	8185	7205	6342
13000	11443	10073	8867	7805	6871
14000	12324	10848	9549	8406	7399
15000	13204	11623	10231	9006	7928

Dienstag, 8. November 2016

Mixed Integer Programming to solve a retail category price optimization problem

In this post we are going to implement the procedure given in "A fractional programming approach for retail category price optimization" by S. Subramanian and H. D. Sherali. We are going to use the SCIP optimization suite with Python bindings. I have constructed a small example with n = 2 products and m = 3 prices per product. Here is the python code:

  """
Solves the retail category price optimization in "A fractional programming approach for retail category
price optimization" from S. Subramanian and H. D. Sherali
Copyright (c) by Orges Leka (2016)
"""
from pyscipopt import Model, quicksum
import math

"""

"""
def retail(n,m,P, d, p0, l, u, phi0, phiMax, R0, elast, mu, lmbda, alpha, beta, K, a,b,c,yMin,yMax):
  model = Model("retail price")
  v, r, g, x, z = {}, {}, {}, {},{}
  y = model.addVar(vtype="CONTINUOUS",name="y")
  phi = model.addVar(vtype="CONTINUOUS",name="phi")
  w = model.addVar(vtype="CONTINUOUS",name="w")
  for i in range(1,n+1):
    for j in range(1,m+1):
      v[i,j] = math.exp(mu[i]+lmbda[i]*P[i,j])
      r[i,j] = v[i,j] * P[i,j]
      g[i,j] = v[i,j] * (P[i,j] - d[i])
      x[i,j] = model.addVar(vtype="CONTINUOUS", name="x(%s,%s)"%(i,j))
      z[i,j] = model.addVar(vtype="BINARY",name="z(%s,%s)"%(i,j))

  model.addCons(quicksum(v[i,j]*x[i,j] for (i,j) in x) == phi) # 3b
  model.addCons( w == math.log(phi0) + elast/(1.0*n) * quicksum( math.log(P[i,j]/p0[i])*z[i,j] for (i,j) in z ) ) # 3d
  model.addCons( alpha*phi0 <= phi ) #3e
  model.addCons( phi <= phiMax ) # 3e
  model.addCons( math.log(alpha*phi0) <= w ) # 3e
  model.addCons( w <= math.log(phiMax) )  # 3e
  model.addCons( quicksum( r[i,j]*x[i,j] for (i,j) in x ) >= beta * R0 ) #3f
  for k in K.keys():
    model.addCons( a[k] * quicksum( P[k,l]*z[k,l] for l in range(1,m+1) ) <= b[k] * quicksum( P[k,l] * z[k,l] for l in range(1,m+1) ) + c[k] ) # 3g
  for k in K.keys():
    model.addCons( a[k] * quicksum( P[k,l]*x[k,l] for l in range(1,m+1) ) <= b[k] * quicksum( P[k,l] * x[k,l] for l in range(1,m+1) ) + c[k]*y ) # 3h
  for i in range(1,n+1):
    model.addCons( quicksum( z[i,j] for j in range(1,m+1)) == 1) # 3i
  for i in range(1,n+1):
    model.addCons( quicksum( x[i,j] for j in range(1,m+1)) == y) # 3j
  for i in range(1,n+1):
    for j in range(1,m+1):
      model.addCons( x[i,j] <= yMax * z[i,j] ) # 3k
      model.addCons( x[i,j] >= yMin * z[i,j] ) # 3l
  model.addCons( yMin <= y ) # 3m
  model.addCons( y <= yMax ) # 3m
  model.addCons( phi >= alpha*phi0*(1+w-math.log(alpha*phi0)) ) # 4b. T=2
  model.addCons( phi >= phiMax*(1+w-math.log(phiMax)) ) # 4b .  T = 2
  model.addCons( phi <= (phiMax-alpha*phi0)/(math.log(phiMax)-math.log(alpha*phi0))*(w-math.log(alpha*phi0))+alpha*phi0 )# 4a, S= 1
  model.setObjective(quicksum(g[i,j]*x[i,j] for (i,j) in x), "maximize") # 3a
  print g
  return model

def generateExample():
  n = 2
  m = 3
  P = { (1,1):2.49, (1,2):2.59, (1,3) : 2.69,
        (2,1):8.99, (2,2):9.09, (2,3) : 9.19 }
  d = { 1 : 1.49, 2 : 7.49 }
  p0 ={ 1 : 2.50, 2 : 9.05 }
  l = { 1 : 2.00, 2 : 8.49 }
  u = { 1 : 3.00, 2 : 9.99 }
  phi0 = 500
  phiMax = 700
  R0 = 50
  elast = -2.0
  mu = { 1 : 0.1, 2: 0.2}
  lmbda = { 1: 0.02, 2: 0.01 }
  alpha = 0.8
  beta = 0.9
  K = { 1 : 1}
  a = { 1 : 0.5}
  b = { 1 : 1.5}
  c = { 1 : 0.5}
  yMin = 0.0
  yMax = 1000.0
  return n,m,P, d, p0, l, u, phi0, phiMax, R0, elast, mu, lmbda, alpha, beta, K, a,b,c, yMin, yMax


if __name__ == "__main__":

    n,m,P, d, p0, l, u, phi0, phiMax, R0, elast, mu, lmbda, alpha, beta, K, a,b,c,yMin,yMax = generateExample()
    model = retail( n,m,P, d, p0, l, u, phi0, phiMax, R0, elast, mu, lmbda, alpha, beta, K, a,b,c,yMin,yMax  )

    model.hideOutput() # silent mode
    model.optimize()
    cost = model.getObjVal()
    print()
    print("Optimized:")
    print("Optimal value:", cost)
    #model.printAttr("X")
    for v in model.getVars():
      print(v.name, "=", model.getVal(v))

Please refer to the article for the meaning of the variables.

Freitag, 21. Oktober 2016

Zusammenhang zwischen Rang und Tore im Fussball

Im folgenden Post wollen wir die Daten aus Kaggle analysieren. In dem Datensatz sind über 25000 Spiele mit 300 Mannschaften vorhanden. Ich habe die Mannschaften absteigend sortiert nach der Gesamtanzahl der Tore, die sie geschossen haben. Diese Sortierung wollen wir Rang nennen. Eine Mannschaft mit vielen Toren steht vor einer anderen Mannschaft mit wenigen Toren, d.h. die erste Mannschaft hat einen niedrigeren Rang als die zweite. Zwischen dem Rang und der Gesamtanzahl an Toren besteht ein mathematischer Zusammenhang. Dazu wollen wir uns zuerst eine Grafik anschauen.

Wie man sieht werden die Punkte (Rang/Gesamtanzahl an Toren) gut beschrieben durch die grüne Funktion. Der Zusammenhang ist wie folgt: \[ T = a_0 + a_1 \cdot \log(R) + a_2 \cdot \log(R)^2 + \epsilon\] \[ T = 706.037 + 24.20481 \cdot \log(R) - 25.32589 \cdot \log(R)^2 +\epsilon \] wobei T die Gesamtanzahl der Tore und R der Rang der Mannschaft ist und epsilon eine normalverteilte Zufallsvariable mit Mittelwert 0 und Standardabweichung 21.49643 ist. Die Berechnungen und die Grafik wurden mit R und MySQL gemacht:

#!/usr/bin/Rscript
source("connectToDB.R")

query <- "select sum(tg) as stg from 
 (
   select home_team_api_id as team_api_id, sum(home_team_goal) tg from game where home_team_api_id is not null and away_team_api_id is not null group by home_team_api_id 
   union
   select away_team_api_id as team_api_id, sum(away_team_goal) tg from game where home_team_api_id is not null and away_team_api_id is not null group by away_team_api_id 
 ) as tmp group by team_api_id order by stg desc;";
 x <- dbGetQuery(con, query);
 Y <- x$stg
 X <- seq(1:length(x$stg))
 d <- data.frame(log(X),Y)
 names(d) <- c("x","y")
 m <- lm(d$y ~ poly(d$x,2,raw=TRUE))
 png("rang-tore.png", width = 16, height = 8, units = 'in', res = 600)
 plot(d$y,type="p",xlab="Rang der Mannschaft",ylab="Gesamtanzahl an Toren",main="Fussball: Rang und Tore",cex=0.3)
 lines(predict(m,d),col="green")
 dev.off()

Wir testen, mit dem Kolmogorov-Smirnov-Test in R, ob der Fehler \[\epsilon\] normalverteilt ist:

  r <- m$residuals;
  ks.test(r,"pnorm",mean=mean(r),sd=sd(r))

Der Test ergibt:

  > ks.test(r,"pnorm",mean=mean(r),sd=sd(r))

 One-sample Kolmogorov-Smirnov test

data:  r
D = 0.0721, p-value = 0.08732
alternative hypothesis: two-sided

Da der p-Wert >= 0.05 ist, kann man davon ausgehen, dass epsilon normalverteilt mit Mittelwert 0 = mean(r) und Standardabweichung = sd(r) = 21.49643 ist.

Donnerstag, 20. Oktober 2016

Gebrauchtwagenpreise auf Ebay-Kleinanzeigen

Ich habe mit Scrapy Daten zu Gebrauchtwagenpreisen auf Ebay-Kleinanzeigen gesammelt. Hier ist eine grafische Auswertung nach Erstzulassungsjahr und Preis in Euro:

Die Grafik wurde mit R erzeugt:

#!/usr/bin/Rscript
x <- read.csv("jahr_preise.csv")
ll <- lm(log(x$preis) ~ poly(x$jahr, 93, raw=TRUE))
png("jahr-preis.png", width = 16, height = 8, units = 'in', res = 600)
plot(x$jahr,x$preis,type="p",xlab="Erstzulassungsjahr",ylab="Preis in EURO",main="Gebrauchtwagenpreise auf Ebay-Kleinanzeigen",cex=0.3)
lines(x$jahr, exp(ll$fitted.values))
dev.off()

Um die Kurve in der Grafik darstellen zu können wurde folgende Regression verwendet: \[ \text{preis} = \exp(\sum_{n=0}^{n=93}{a_n\cdot\text{jahr}^n}) \]

Mittwoch, 13. Juli 2016

Breadth First Search in Python and Java

Breadth-First-Search-Algorithm:

Here is the python code: You can call it like this:

./bfs.py graph.txt v

The file graph.txt:

v:r
r:s,v
s:r,w
w:t,x,s
t:w,x,u
x:w,t,u,y
u:t,x,y
y:x,u

#!/usr/bin/python
import sys

def read_graph(filename):
  f = open(filename,'r')
  graph = {}
  for line in f:
    line = line.replace('\n','')
    fromVertex, toList = line.split(':')
    toList = toList.split(',')
    graph[fromVertex] = toList
  return graph


def bfs(graph,s):
  color = {}
  dist = {}
  pi = {}
  for u in graph.keys():
    if u != s:
      color[u] = 'white'
      dist[u] = None
      pi[u] = None

  color[s] = 'gray'
  dist[s] = 0
  pi[s] = None
  Q = []
  Q.append(s)
  while len(Q) > 0:
    u = Q.pop(0)
    for v in graph[u]:
      if color[v] == 'white':
        color[v] = 'gray'
        dist[v] = dist[u]+1
        pi[v] = u
        Q.append(v)
    color[u] = 'black'
    for x in graph.keys():
      print x,color[x],dist[x],pi[x]
    print "---------------"

  return color,dist,pi

def print_path(graph,s,v,pi):
  if v == s:
    print s
  elif pi[v] is None:
    print "no path from ", s , " to ", v, " exists"
  else:
    print_path(graph, s, pi[v], pi)
    print v

if __name__ == '__main__':
  graph = read_graph(sys.argv[1])
  s = sys.argv[2]
  color,dist,pi = bfs(graph,s)

Here is the java code:

  import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;


public class BFS {

 private static HashMap color;
 private static HashMap dist;
 private static HashMap pi;
 
 public static void bfs(HashMap graph, String s)
 {

  color = new HashMap();
  dist = new HashMap();
  pi = new HashMap();
  for(String u : graph.keySet())
  {
    if( !u.equals(s)){
   color.put(u,  "white");
   dist.put(u, null);
   pi.put(u, null);
    }
  }

  color.put(s, "gray");
  dist.put(s, 0);
  pi.put(s, null);
  ArrayList Q = new ArrayList();
  Q.add(s);
  while( Q.size() > 0)
  {
     String u = Q.get(0);
     Q.remove(0);
     for(String v : graph.get(u))
     {
       if( color.get(v).equals("white") ) 
       {
         color.put(v, "gray");
         dist.put(v, dist.get(u)+1);
         pi.put(v, u);
         Q.add(v);
       }
     }
     color.put(u,"black");
     for( String x : graph.keySet())
     {
      System.out.println( x + " , " + color.get(x) + " , " + dist.get(x) + " , "+pi.get(x));
     }
     System.out.println("---------------");
  }
 }

 
 
 public static HashMap read_graph(String filename)
 {
  HashMap graph = new HashMap();
  try (BufferedReader br = new BufferedReader(new FileReader(filename))) {
      String line;
      while ((line = br.readLine()) != null) {
         String[] splitted1 = line.split(":");
         String fromVertex = splitted1[0];
         String[] toVertexList = splitted1[1].split(",");
         graph.put(fromVertex, toVertexList);
      }
  } catch (FileNotFoundException e) {
   System.out.println("file not found");
   e.printStackTrace();
   System.exit(1);
  } catch (IOException e) {
   e.printStackTrace();
   System.exit(1);
  }
  return graph;
 }
 
 /**
  * @param args
  */
 public static void main(String[] args) {
  // TODO Auto-generated method stub
        HashMap graph = read_graph(args[0]);
        bfs(graph,args[1]);
 }

}

You can call the java program like this:

java BFS graph.txt v

After calling the python / java version, you should get something similar to this:

r , gray , 1 , v
s , white , null , null
t , white , null , null
u , white , null , null
v , black , 0 , null
w , white , null , null
x , white , null , null
y , white , null , null
---------------
r , black , 1 , v
s , gray , 2 , r
t , white , null , null
u , white , null , null
v , black , 0 , null
w , white , null , null
x , white , null , null
y , white , null , null
---------------
r , black , 1 , v
s , black , 2 , r
t , white , null , null
u , white , null , null
v , black , 0 , null
w , gray , 3 , s
x , white , null , null
y , white , null , null
---------------
r , black , 1 , v
s , black , 2 , r
t , gray , 4 , w
u , white , null , null
v , black , 0 , null
w , black , 3 , s
x , gray , 4 , w
y , white , null , null
---------------
r , black , 1 , v
s , black , 2 , r
t , black , 4 , w
u , gray , 5 , t
v , black , 0 , null
w , black , 3 , s
x , gray , 4 , w
y , white , null , null
---------------
r , black , 1 , v
s , black , 2 , r
t , black , 4 , w
u , gray , 5 , t
v , black , 0 , null
w , black , 3 , s
x , black , 4 , w
y , gray , 5 , x
---------------
r , black , 1 , v
s , black , 2 , r
t , black , 4 , w
u , black , 5 , t
v , black , 0 , null
w , black , 3 , s
x , black , 4 , w
y , gray , 5 , x
---------------
r , black , 1 , v
s , black , 2 , r
t , black , 4 , w
u , black , 5 , t
v , black , 0 , null
w , black , 3 , s
x , black , 4 , w
y , black , 5 , x
---------------

Orges Leka