Click here to load reader

Discovering and analyzing income determinants using decision trees

Embed Size (px)

DESCRIPTION

Discovering and analyzing income determinants using decision trees. Krzysztof Karpio Piotr Łukasiewicz Arkadiusz Orłowski Tomasz Ząbkowski. Data . Households incomes Poland Years : 2000 – 2010 „Budżety G ospodarstw Domowych” - GUS About 36 000 households in each year - PowerPoint PPT Presentation

Citation preview

Discovering and analyzing income determinants using decision trees.

Krzysztof KarpioPiotr ukasiewiczArkadiusz OrowskiTomasz ZbkowskiWarsaw University of Life Sciences - SGGW1Discovering and analyzing income determinants using decision trees1

Data Households incomesPolandYears: 2000 2010Budety Gospodarstw Domowych - GUSAbout 36 000 households in each yearHousehold income / Number of earnersReal income (based on prices in 2008).Warsaw University of Life Sciences - SGGW2Atrybut decyzyjny2Conditional atributesWarsaw University of Life Sciences - SGGW3Sex of a family headEducation of a family headAge of a family headEconomic group of a householdFamily typeNumber of persons in a householdNumber of childrenNumber of earnersClass of place of residenceVoivodeship

FEMALE MALEMean: 17.3 20.4 kPLN

VILLAGE CITYMean: 16.6 26.3 kPLN

PODKARPACKIE MAZOWIECKIEMean: 14.7 23.6 kPLNIncomes 2008Warsaw University of Life Sciences - SGGW4

8 kPLN16 kPLN45 kPLNMODERATE48%HIGH 5%AVERAGE40%LOW 7%MethodWarsaw University of Life Sciences - SGGW5

Rudolf Clausius (1822 1888)Decision tree

Entropy

Gain

Rudolf Clausius (1822 1888)

5

Attributes tree 2008Warsaw University of Life Sciences - SGGW6at least a secondarymarriedcouplepensionersTree nodes and leavesAttributes 2000 - 2010Warsaw University of Life Sciences - SGGW7EducationFamily typeEconomic groupNumber of earnersClass of place of residence

EducationFamily typeEconomic groupNumber of earnersClass of place of residenceNodes&LeavesLowAverageModerateHigh07%40%48%5%13%26%63%8%1.11%12%71%16%1.24%33%59%4%1.2.12%24%67%6%1.2.26%41%51%2%212%52%34%2%2.17%45%46%3%2.1.16%33%56%5%2.1.27%68%25%0%2.215%58%26%1%2.2.132%38%27%4%2.2.212%62%25%0%Nodes&LeavesLowAverageModerateHigh07%40%48%5%1.11%12%71%16%1.2.12%24%67%6%1.2.26%41%51%2%2.1.16%33%56%5%2.1.27%68%25%0%2.2.132%38%27%4%2.2.212%62%25%0%Not relevant 2000 - 2010Sex of a family headAgeNumber of personsNumber of childrenVoivodeship.Information GainWarsaw University of Life Sciences - SGGW8

GAIN0,012-classes (high income)Warsaw University of Life Sciences - SGGW9

2-classes (low income)Warsaw University of Life Sciences - SGGW10

ECONOGROUP: 10Efficiency of treesHigh incomeLow incomeWarsaw University of Life Sciences - SGGW11

SummaryWarsaw University of Life Sciences - SGGW12The most important attribute: EducationHigher Education (BA & MA) preferedImportant attributes: Education,Family Type (marriage), Economic Group (pensioners), Resindence (big cities), Number of Earners (1 or 2)Evolution of attributes (2000-2010)Education - stable, the most importantNumer of Earners decreasing importanceEconomic Group increasing importanceFamily Type the weakest but noticable importanceLack of relevance of: Sex, Age, Voivodeshipto be continued ..

Thank YouWarsaw University of Life Sciences - SGGW13REFERENCESQuinlan, J. R. C4.5: Programs for Machine Learning, Morgan Kaufmann, (1993) Los AltosKemal Polat , Salih Gunes, A novel hybrid intelligent method based on C4.5 decision treeclassifier and one-against-all approach for multi-class classification problems, Expert Systems with Applications 36 (2009) 1587.THANK YOU