Discovering and analyzing income determinants using decision trees

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Discovering and analyzing income determinants using decision trees. Krzysztof Karpio Piotr Łukasiewicz Arkadiusz Orłowski Tomasz Ząbkowski. Data . Households incomes Poland Years : 2000 – 2010 „Budżety G ospodarstw Domowych” - GUS About 36 000 households in each year - PowerPoint PPT Presentation

Text of Discovering and analyzing income determinants using decision trees

Discovering and analyzing income determinants using decision trees.

Krzysztof KarpioPiotr ukasiewiczArkadiusz OrowskiTomasz ZbkowskiWarsaw University of Life Sciences - SGGW1Discovering and analyzing income determinants using decision trees1

Data Households incomesPolandYears: 2000 2010Budety Gospodarstw Domowych - GUSAbout 36 000 households in each yearHousehold income / Number of earnersReal income (based on prices in 2008).Warsaw University of Life Sciences - SGGW2Atrybut decyzyjny2Conditional atributesWarsaw University of Life Sciences - SGGW3Sex of a family headEducation of a family headAge of a family headEconomic group of a householdFamily typeNumber of persons in a householdNumber of childrenNumber of earnersClass of place of residenceVoivodeship

FEMALE MALEMean: 17.3 20.4 kPLN

VILLAGE CITYMean: 16.6 26.3 kPLN

PODKARPACKIE MAZOWIECKIEMean: 14.7 23.6 kPLNIncomes 2008Warsaw University of Life Sciences - SGGW4

8 kPLN16 kPLN45 kPLNMODERATE48%HIGH 5%AVERAGE40%LOW 7%MethodWarsaw University of Life Sciences - SGGW5

Rudolf Clausius (1822 1888)Decision tree

Entropy

Gain

Rudolf Clausius (1822 1888)

5

Attributes tree 2008Warsaw University of Life Sciences - SGGW6at least a secondarymarriedcouplepensionersTree nodes and leavesAttributes 2000 - 2010Warsaw University of Life Sciences - SGGW7EducationFamily typeEconomic groupNumber of earnersClass of place of residence

EducationFamily typeEconomic groupNumber of earnersClass of place of residenceNodes&LeavesLowAverageModerateHigh07%40%48%5%13%26%63%8%1.11%12%71%16%1.24%33%59%4%1.2.12%24%67%6%1.2.26%41%51%2%212%52%34%2%2.17%45%46%3%2.1.16%33%56%5%2.1.27%68%25%0%2.215%58%26%1%2.2.132%38%27%4%2.2.212%62%25%0%Nodes&LeavesLowAverageModerateHigh07%40%48%5%1.11%12%71%16%1.2.12%24%67%6%1.2.26%41%51%2%2.1.16%33%56%5%2.1.27%68%25%0%2.2.132%38%27%4%2.2.212%62%25%0%Not relevant 2000 - 2010Sex of a family headAgeNumber of personsNumber of childrenVoivodeship.Information GainWarsaw University of Life Sciences - SGGW8

GAIN0,012-classes (high income)Warsaw University of Life Sciences - SGGW9

2-classes (low income)Warsaw University of Life Sciences - SGGW10

ECONOGROUP: 10Efficiency of treesHigh incomeLow incomeWarsaw University of Life Sciences - SGGW11

SummaryWarsaw University of Life Sciences - SGGW12The most important attribute: EducationHigher Education (BA & MA) preferedImportant attributes: Education,Family Type (marriage), Economic Group (pensioners), Resindence (big cities), Number of Earners (1 or 2)Evolution of attributes (2000-2010)Education - stable, the most importantNumer of Earners decreasing importanceEconomic Group increasing importanceFamily Type the weakest but noticable importanceLack of relevance of: Sex, Age, Voivodeshipto be continued ..

Thank YouWarsaw University of Life Sciences - SGGW13REFERENCESQuinlan, J. R. C4.5: Programs for Machine Learning, Morgan Kaufmann, (1993) Los AltosKemal Polat , Salih Gunes, A novel hybrid intelligent method based on C4.5 decision treeclassifier and one-against-all approach for multi-class classification problems, Expert Systems with Applications 36 (2009) 1587.THANK YOU