89
Classification: Basic Concepts, Decision Trees, and Model Evaluation School of Computer Science and Technology USTC htt // t ff t d / ili l/DM2013 ht l http://staff.ustc.edu.cn/~qiliuql/DM2013.html

Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

  • Upload
    vophuc

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification: Basic Concepts, Decision

Trees, and Model Evaluation

刘 淇刘 淇School of Computer Science and Technology

USTChtt // t ff t d / ili l/DM2013 ht lhttp://staff.ustc.edu.cn/~qiliuql/DM2013.html

Page 2: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification: Definition

• Given a collection of data records (training set)records (training set)

Each record is characterized by a tuple (x, y) where x is the attribute

ID Home Owner

Marital Status

Annual Income

Defaulted Borrower

1 Yes Single 125K No y), where x is the attribute set and y is the class label

x: attribute, predictor, independent variable input

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K Noindependent variable, inputy: class, response, dependent variable, output

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

• Task:Learn a model that maps

h tt ib t t i t

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K Noeach attribute set x into one of the predefined class labels y

10 No Single 90K Yes 10

2

Page 3: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Examples of Classification Task

Task Attribute set, x Class label, yDecide to mail a catalog or not

Demographical information for h h ld

Purchase or no purchase

householdsCustomer churn prediction

Usage data for phone users

Churn or non-churnprediction phone usersDecide to issue credit card or not

Application data Good credit or bad creditcredit card or not credit

Categorizing email messages

Words in the document

spam or non-spamemail messages or webpages

document

3

Page 4: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

General Approach for Building Classification Model

Model Training

Tid Attrib1 Attrib2 Attrib3 Class

1 Yes Large 125K No

2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

5 No Large 95K Yes

6 No Medium 60K No

Learn Model

7 Yes Large 220K No

8 No Small 85K Yes

9 No Medium 75K No

10 No Small 90K Yes 10

Page 5: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

General Approach for Building Classification Model

Tid Attrib1 Attrib2 Attrib3 Class

Model Testing

1 Yes Large 125K No

2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

Learn Model

5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No

8 No Small 85K Yes Model8 No Small 85K Yes

9 No Medium 75K No

10 No Small 90K Yes 10

Apply ModelTid Attrib1 Attrib2 Attrib3 Class

11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?

14 No Small 95K ?

15 No Large 67K ?

5

15 No Large 67K ?10

Page 6: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Performance Evaluation

PREDICTED CLASS

Class=Yes Class=No

ACTUAL Class=Yes a(TP)

b(FN)

CLASS(TP) (FN)

Class=No c(FP)

d(TN)

Most widely-used metric:

( ) ( )

FNFPTNTPTNTP

dcbada

++++

=+++

+=Accuracy

6

Page 7: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification Techniques

• Base ClassifiersDecision Tree based MethodsRule-based MethodsNearest-neighborNeural NetworksNaïve Bayes and Bayesian Belief NetworksSupport Vector Machines

• Ensemble ClassifiersBoosting, Bagging, Random Forests

Page 8: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Decision Trees

• Examples and Introduction• Usage of Decision Tree• Usage of Decision Tree• Decision Tree Induction• ……

8

Page 9: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Example of a Decision Tree Built

ID Home Owner

Marital Status

Annual Income

Defaulted Borrower

Splitting Attributes

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

Home Owner

Yes No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

MarStNOMarriedSingle, Divorced

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 N M i d 75K N

Income

YESNO

NO< 80K > 80K

9 No Married 75K No

10 No Single 90K Yes 10

YESNO

Training Data Model: Decision Tree

9

Training Data Model: Decision Tree

Page 10: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Concepts of The Tree Structure

Home O nerOwner

MarStNO

Yes No

Income NO

MarriedSingle, Divorced

YESNO

< 80K > 80K

10

Page 11: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Another Example of Decision Tree Built

MarSt

HNO

MarriedSingle,

DivorcedID Home

OwnerMarital Status

Annual Income

Defaulted Borrower

Home Owner

Income

NO

NO

Yes No1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No Income

YESNO

NO< 80K > 80K

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

Th b b th t th t

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 N M i d 75K N There cab be more than one tree that fits the same data!

9 No Married 75K No

10 No Single 90K Yes 10

Training Data

11

Training Data

Page 12: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Using Decision Tree for Classification

Tid Attrib1 Attrib2 Attrib3 Class

1 Yes Large 125K No1 Yes Large 125K No

2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

5 No Large 95K Yes

Learn Model

5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No

8 No Small 85K Yes

9 No Medium 75K No

Apply

9 No Medium 75K No

10 No Small 90K Yes 10

DecisionApply Model

Tid Attrib1 Attrib2 Attrib3 Class

11 No Small 55K ?

12 Yes Medium 80K ?

Decision Tree

13 Yes Large 110K ?

14 No Small 95K ?

15 No Large 67K ? 10

12

Page 13: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Applying Model to Test Data

Test DataStart from the root of tree.

Home

Home Owner

MaritalStatus

AnnualIncome

DefaultedBorrower

No Married 80K ? 10 Home

Owner

MarStNO

Yes No

MarStNO

MarriedSingle, Divorced

Income NO

< 80K > 80K

YESNO

13

Page 14: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Apply Model to Test Data

Test Data

Home Owner

MaritalStatus

AnnualIncome

DefaultedBorrower

No Married 80K ? 10 Home

MarStNO

Yes NoHome Owner

MarStNO

MarriedSingle, Divorced

Income NO

< 80K > 80K

YESNO

14

Page 15: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Apply Model to Test Data

Test Data

Home Owner

MaritalStatus

AnnualIncome

DefaultedBorrower

No Married 80K ? 10 Home

MarStNO

Yes NoHome Owner

MarStNO

MarriedSingle, Divorced

Income NO

< 80K > 80K

YESNO

15

Page 16: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Apply Model to Test Data

Test Data

Home Owner

MaritalStatus

AnnualIncome

DefaultedBorrower

No Married 80K ? 10 Home

MarStNO

Yes NoHome Owner

MarStNO

MarriedSingle, Divorced

Income NO

< 80K > 80K

YESNO

16

Page 17: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Apply Model to Test Data

Test Data

Home Owner

MaritalStatus

AnnualIncome

DefaultedBorrower

No Married 80K ? 10 Home

MarStNO

Yes NoHome Owner

MarStNO

Married Single, Divorced

Income NO

< 80K > 80K

YESNO

17

Page 18: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Apply Model to Test Data

Test Data

Home Owner

MaritalStatus

AnnualIncome

DefaultedBorrower

No Married 80K ? 10 Home

MarStNO

Yes NoHome Owner

MarStNO

Married Single, Divorced Assign Defaulted to “No”

Income NO

< 80K > 80K

YESNO

18

Page 19: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

How to Build a Decision Tree?

Tid Attrib1 Attrib2 Attrib3 Class

1 Yes Large 125K No

2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

Learn Model

5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No

8 No Small 85K Yes

9 No Medium 75K No

10 No Small 90K Yes 10

Apply Model

Tid Attrib1 Attrib2 Attrib3 Class

11 No Small 55K ?

12 Yes Medium 80K ?

Decision Tree

12 Yes Medium 80K ?

13 Yes Large 110K ?

14 No Small 95K ?

15 No Large 67K ? 10

19

Page 20: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Decision Tree Induction

• How to Build a Decision Tree from a Data TableFamous AlgorithmsFamous Algorithms

Hunt’s Algorithm (one of the earliest)CARTID3, C4.5SLIQ, SPRINT

20

Page 21: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

General Structure of Hunt’s Algorithm

Let Dt be the set of training records that reach a node t

ID Home Owner

Marital Status

Annual Income

Defaulted Borrower

1 Yes Single 125K No

records that reach a node tIf Dt contains records that belong the same class yt,

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

belong the same class yt, then t is a leaf node labeled as yt

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

If Dt contains records that belong to more than one class, use an attribute test to

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

class, use an attribute test to split the data into smaller subsets.

10 No Single 90K Yes 10

Recursively apply the procedure to each subset.

21

Page 22: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Hunt’s Algorithm

ID Home Owner

Marital Status

Annual Income

Defaulted Borrower

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes 10

22

Page 23: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Design Issues of Decision Tree Induction

• How should training records be split?Method for specifying test conditionMethod for specifying test condition

depending on attribute types

Measure for evaluating the goodness of a test conditionMeasure for evaluating the goodness of a test condition

• How should the splitting procedure stop?• How should the splitting procedure stop?Stop splitting if all the records belong to the same class or all the records have identical attribute valuesor all the records have identical attribute valuesEarly termination

23

Page 24: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Methods for Expressing Test Conditions

• Depends on attribute typesBinary(二元)Binary(二元)

Nominal(标称)

Ordinal(有序)Ordinal(有序)

Continuous(连续)

• Depends on number of ways to split2 lit2-way splitMulti-way split

24

Page 25: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Test Condition for Nominal Attributes

• Multi-way split:Use as many partitions as y pdistinct values.

• Binary split:y pDivides values into two subsets at a timeNeed to find optimal partitioning.

25

Page 26: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Test Condition for Ordinal Attributes

• Multi-way split:Use as many partitions as y pdistinct values

• Binary split:Divides values into two subsetsNeed to find optimal partitioningPreserve the order property among attribute values This grouping

violates order property

26

Page 27: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Test Condition for Continuous Attributes

27

Page 28: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Splitting Based on Continuous Attributes

• Different ways of handlingDiscretization to form an ordinal categorical attributeDiscretization to form an ordinal categorical attribute

Static – discretize once at the beginningDynamic – ranges can be found by equal interval

bucketing, equal frequency bucketing(percentiles), or clustering.

Binary Decision: (A < v) or (A ≥ v)consider all possible splits and finds the best cutconsider all possible splits and finds the best cutcan be more compute-intensive

28

Page 29: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

How to determine the Best Split

Before Splitting: 10 records of class 0,10 records of class 110 records of class 1

Which test condition is the best?

29

Which test condition is the best?

Page 30: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

How to determine the Best Split

• Greedy approach: Nodes with purer class distribution are preferredNodes with purer class distribution are preferred

• Need a measure of node impurity:Need a measure of node impurity:

High degree of impurity Low degree of impurity

30

Page 31: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Measures of Node Impurity

• Gini Index∑= tjptGINI 2)]|([1)( ∑−=

jtjptGINI )]|([1)(

• Entropy∑−= tjptjptEntropy )|(log)|()(

• Misclassification error

∑j

jpjppy )|(g)|()(

• Misclassification error

)|(max1)( tiPtError )|(max1)( tiPtErrori

−=

31

Page 32: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Finding the Best Split

1. Compute impurity measure (P) before splitting2 Compute impurity measure (M) after splitting2. Compute impurity measure (M) after splitting

Compute impurity measure of each child nodeCompute the average impurity of the children (M)Compute the average impurity of the children (M)

3. Choose the attribute test condition that produces the highest gainthe highest gain

Gain = P – M

or equivalently, lowest impurity measure after splitting (M)splitting (M)

32

Page 33: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Finding the Best Split

Before Splitting: C0 N00 C1 N01 P

B?

Yes No

A?

Yes No

C 0

Yes No

Node N3 Node N4

Yes No

Node N1 Node N2

C0 N10 C1 N11

C0 N20 C1 N21

C0 N30 C1 N31

C0 N40 C1 N41

M11 M12 M21 M22

M1 M2

33

Gain = P – M1 vs P – M2

Page 34: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Measure of Impurity: GINI

• Gini Index for a given node t :

∑ tjtGINI 2)]|([1)(

(NOTE: p( j | t) is the relative frequency of class j at node t)

∑−=j

tjptGINI 2)]|([1)(

(NOTE: p( j | t) is the relative frequency of class j at node t).

Maximum (1 - 1/nc) when records are equally distributed among all classes implying least interesting informationamong all classes, implying least interesting informationMinimum (0.0) when all records belong to one class, implying most interesting informationp y g g

C1 0 C1 2 C1 3C1 1C1 0C2 6

Gini=0.000

C1 2C2 4

Gini=0.444

C1 3C2 3

Gini=0.500

C1 1C2 5

Gini=0.278

34

Page 35: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Computing Gini Index of a Single Node

∑−= tjptGINI 2)]|([1)(

C1 0

j

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1C1 0 C2 6

P(C1) 0/6 0 P(C2) 6/6 1

Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0

C1 1 C2 5

P(C1) = 1/6 P(C2) = 5/6C2 5

Gini = 1 – (1/6)2 – (5/6)2 = 0.278

C1 2 C2 4

P(C1) = 2/6 P(C2) = 4/6

Gini = 1 – (2/6)2 – (4/6)2 = 0.444

35

Page 36: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Computing Gini Index for a Collection of Nodes• When a node p is split into k partitions (children)

k

∑=

=k

i

isplit iGINI

nnGINI

1)(

where, ni = number of records at child i,n = number of records at parent node pn number of records at parent node p.

• Choose the attribute that minimizes weighted average GiniChoose the attribute that minimizes weighted average Gini index of the children

• Gini index is used in decision tree algorithms such as CART, SLIQ, SPRINT

36

Page 37: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Binary Attributes: Computing GINI Index• Splits into two partitions• Effect of Weighing partitions:• Effect of Weighing partitions:

Larger and Purer Partitions are sought for.

B? Parent

C1 6

Yes No

Node N1 Node N2

C2 6Gini = 0.500

Node N1 Node N2

N1 N2C1 5 2

Gini(N1) = 1 – (5/6)2 – (1/6)2

= 0.278 Gini(Children) = 6/12 * 0 278 +C1 5 2

C2 1 4 Gini=0.361

Gini(N2) = 1 – (2/6)2 – (4/6)2

0 444

= 6/12 * 0.278 + 6/12 * 0.444

= 0.361

37

= 0.444

Page 38: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Categorical Attributes: Computing Gini Index• For each distinct value, gather counts for each

class in the datasetclass in the dataset• Use the count matrix to make decisions

Multi-way split Two-way split (find best partition of values)

CarType {Sports, {Family}

CarType

{Sports} {Family, CarType

Family Sports Luxury

(find best partition of values)

{ pLuxury} {Family}

C1 9 1 C2 7 3

Gini 0 468

{Sports} { yLuxury}

C1 8 2 C2 0 10

Gini 0 167

Family Sports LuxuryC1 1 8 1 C2 3 0 7

Gini 0 163 Gini 0.468

Gini 0.167

Gini 0.163

38

Page 39: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Continuous Attributes: Computing Gini Index• Use Binary Decisions based on one

value• Several Choices for the splitting value

ID Home Owner

Marital Status

Annual Income Defaulted

• Several Choices for the splitting valueNumber of possible splitting values = Number of distinct values

E h litti l h t t i

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

• Each splitting value has a count matrix associated with it

Class counts in each of the

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Y Di d 220K Npartitions, A < v and A ≥ v• Simple method to choose best v

For each v, scan the database to

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 N Si l 90K YFor each v, scan the database to gather count matrix and compute its Gini indexComputationally Inefficient!

10 No Single 90K Yes10

Computationally Inefficient! Repetition of work.

39

Page 40: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Continuous Attributes: Computing Gini Index...• For efficient computation: for each attribute,

Sort the attribute on valuesLi l th l h ti d ti th t t i dLinearly scan these values, each time updating the count matrix and computing gini indexChoose the split position that has the least gini index

Cheat No No No Yes Yes Yes No No No No

Annual Income

60 70 75 85 90 95 100 120 125 220

55 65 72 80 87 92 97 110 122 172 230

< > < > < > < > < > < > < > < > < > < > < >

Split PositionsSorted Values

<= > <= > <= > <= > <= > <= > <= > <= > <= > <= > <= >

Yes 0 3 0 3 0 3 0 3 1 2 2 1 3 0 3 0 3 0 3 0 3 0

No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0

Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.400 0.420

40

Page 41: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Measure of Impurity: Entropy

• Entropy at a given node t:

jjE )|(l)|()(

(NOTE ( j | t) i th l ti f f l j t d t)

∑−=j

tjptjptEntropy )|(log)|()(

(NOTE: p( j | t) is the relative frequency of class j at node t).

Maximum (log nc) when records are equally distributed among all ( g c) q y gclasses implying least informationMinimum (0.0) when all records belong to one class, implying most information

Entropy based computations are quite similar to the GINI index computations

41

Page 42: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Computing Entropy of a Single Node

∑−=j

tjptjptEntropy )|(log)|()(2

C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

j

C1 0 C2 6

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

Entropy = – 0 log 0 – 1 log 1 = – 0 – 0 = 0

C1 1 C2 5

P(C1) = 1/6 P(C2) = 5/6

C2 5

Entropy = – (1/6) log2 (1/6) – (5/6) log2 (1/6) = 0.65

C1 2 C2 4

P(C1) = 2/6 P(C2) = 4/6

Entropy = – (2/6) log2 (2/6) – (4/6) log2 (4/6) = 0.92

42

Page 43: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Computing Information Gain After Splittingp g• Information Gain:

⎞⎛ n⎟⎠⎞

⎜⎝⎛−= ∑

=

k

i

i

splitiEntropy

nnpEntropyGAIN

1)()(

Parent Node, p is split into k partitions;ni is number of records in partition i

Choose the split that achieves most reduction (maximizes GAIN)GAIN)

Used in the ID3 and C4.5 decision tree algorithms

43

Page 44: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Problems with Information Gain

• Info Gain tends to prefer splits that result in large number of partitions, each being small but purenumber of partitions, each being small but pure

Customer ID has highest information gain becauseCustomer ID has highest information gain because entropy for all the children is zero

44

Page 45: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Gain Ratio

• Gain Ratio:

GAINSplitINFO

GAINGainRATIO Split

split= ∑

=−=

k

i

ii

nn

nnSplitINFO

1log

Parent Node, p is split into k partitionsni is the number of records in partition i

Adjusts Information Gain by the entropy of the partitioning (SplitINFO).

Higher entropy partitioning (large number of small partitions) isHigher entropy partitioning (large number of small partitions) is penalized!

Used in C4.5 algorithmDesigned to overcome the disadvantage of Information Gain

45

Page 46: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Measure of Impurity: Classification Error• Classification error at a node t :

)|(max1)( tiPtErrori

−=

Maximum (1 - 1/nc) when records are equally distributed ll l i l i l t i t ti i f ti

i

among all classes, implying least interesting informationMinimum (0) when all records belong to one class, implying most interesting informationimplying most interesting information

46

Page 47: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Computing Error of a Single Node

)|(max1)( tiPtErrori

−=

C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

)|()(i

C1 0 C2 6

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

Error = 1 – max (0, 1) = 1 – 1 = 0

C1 1 C2 5

P(C1) = 1/6 P(C2) = 5/6

C2 5

Error = 1 – max (1/6, 5/6) = 1 – 5/6 = 1/6

C1 2 C2 4

P(C1) = 2/6 P(C2) = 4/6

Error = 1 – max (2/6, 4/6) = 1 – 4/6 = 1/3

47

Page 48: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Comparison among Impurity MeasuresFor a 2-class problem:

48不同的不纯性度量是一致的。但是,作为测试条件的属性选择仍然因不纯性度量的选择而异。

表示属于其中一个类的记录所占的比例

Page 49: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Misclassification Error vs Gini Index

A? Parent

Yes NoC1 7

C2 3 Gi i 0 42Node N1 Node N2 Gini = 0.42

N1 N2 C1 3 4

Gini(N1) = 1 – (3/3)2 – (0/3)2 Gini(Children)

= 3/10 * 0C1 3 4C2 0 3 Gini=0.342

= 0

Gini(N2) = 1 (4/7)2 (3/7)2

= 3/10 * 0 + 7/10 * 0.489= 0.342

= 1 – (4/7)2 – (3/7)2

= 0.489 Gini improves but error remains the same!!

49

same!!

Page 50: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Tree Induction

• Greedy strategy.Split the records based on an attribute test that optimizesSplit the records based on an attribute test that optimizes certain criterion.

• IssuesD t i h t lit th dDetermine how to split the records

How to specify the attribute test condition?How to determine the best split?How to determine the best split?

Determine when to stop splitting

50

Page 51: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Stopping Criteria for Tree Induction

• Stop expanding a node when all the records belong to the same classto the same class

St di d h ll th d h• Stop expanding a node when all the records have similar attribute values

• Early termination (to be discussed later)y ( )

51

Page 52: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Decision Tree Based Classification

• Advantages:Inexpensive to constructInexpensive to constructExtremely fast at classifying unknown recordsEasy to interpret for small-sized treesEasy to interpret for small-sized treesAccuracy is comparable to other classification techniques for many simple data setsfor many simple data sets

52

Page 53: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Example: C4.5

• Simple depth-first construction.• Uses Information Gain• Uses Information Gain• Sorts Continuous Attributes at each node.• Needs entire data to fit in memory.• Unsuitable for Large Datasets.g

Needs out-of-core sorting.

• You can download the software from:http://www.cse.unsw.edu.au/~quinlan/c4.5r8.tar.gzp q g

53

Page 54: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification: Model Overfitting andClassification: Model Overfitting and Classifier Evaluation

Page 55: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification Errors

• Training errors (apparent errors)Errors committed on the training setErrors committed on the training set

• Test errors• Test errorsErrors committed on the test set

• Generalization errorsExpected error of a model over random selection of records from same distribution(未知记录上的期望误差)

Page 56: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Example Data Set

Two class problem:

+, o

3000 data points (30% for training 70% for testing)training, 70% for testing)

Data set for + class is generated from a uniform distribution

Data set for o class is generated from a mixture of 3 gaussian gdistributions, centered at (5,15), (10,5), and (15,15)

Page 57: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Decision Trees

Decision Tree with 11 leaf nodes Decision Tree with 24 leaf nodesDecision Tree with 11 leaf nodes Decision Tree with 24 leaf nodes

Which tree is better?Which tree is better?

Page 58: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Model Overfitting

Underfitting: when model is too simple, both training and test errors are large

Overfitting: when model is too complex, training error is small but test error is largeOverfitting: when model is too complex, training error is small but test error is large

Page 59: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Overfitting due to Noise

Decision boundary is distorted by noise point

59

y y p

Page 60: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Overfitting due to Insufficient Examplesp

Lack of data points in the lower half of the diagram makes it difficult t di t tl th l l b l f th t ito predict correctly the class labels of that region

- Insufficient number of training records in the region causes the decision tree to predict the test examples using other training

60

g grecords that are irrelevant to the classification task

Page 61: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Mammal Classification Problem

Training Set

Decision Tree Model

training error = 0%training error = 0%

Page 62: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Effect of Noise (Data is wrong)

Training Set:

Example: Mammal Classification problemModel M1:

t i 0%Body Temperature

Training Set: train err = 0%,

test err = 30%

Warm-blooded Cold-blooded

Give BirthNon-

mammals

Test Set:

Yes No

M l Non- Model M2:Mammals mammals train err = 20%,

test err = 10%

Page 63: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Lack of Representative Samples

Training Set:

Test Set:

Model M3:

train err = 0%,

test err = 30%

Lack of training records at the leaf nodes for making reliable classificationclassification

Page 64: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Effect of Multiple Comparison Procedure

• Consider the task of predicting whether stock market will rise/fall in the next 10

Day 1 Upstock market will rise/fall in the next 10 trading days

Day 2 DownDay 3 Down

• Random guessing:P(correct) = 0.5

Day 4 UpDay 5 DownD 6 D( )

• Make 10 random guesses in a row:

Day 6 DownDay 7 UpDay 8 UpMake 10 random guesses in a row: Day 8 UpDay 9 UpDay 10 Down101010

⎟⎟⎞

⎜⎜⎛

+⎟⎟⎞

⎜⎜⎛

+⎟⎟⎞

⎜⎜⎛ Day 10 Down

0547.02

1098)8(# 10 =

⎟⎟⎠

⎜⎜⎝

+⎟⎟⎠

⎜⎜⎝

+⎟⎟⎠

⎜⎜⎝=≥correctP

Page 65: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Effect of Multiple Comparison Procedure

• Approach:Get 50 analystsEach analyst makes 10 random guessesChoose the analyst that makes the most number of correct predictions

• Probability that at least one analyst makes at least 8 correct predictions

9399.0)0547.01(1)8(# 50 =−−=≥correctP

Page 66: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Effect of Multiple Comparison Procedure• Many algorithms employ the following greedy strategy:

Initial model: MAlternative model: M’ = M ∪ γ, where γ is a component to be added to the model (e.g., a test condition of a decision tree)test condition of a decision tree)Keep M’ if improvement, Δ(M,M’) > α

• Often times, γ is chosen from a set of alternative components, Γ = {γ1, γ2, …, γk}p {γ1 γ2 γk}

• If many alternatives are available, one may inadvertently add irrelevant components to the model, resulting in model overfitting

Page 67: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Notes on Overfitting

• Overfitting results in decision trees that are more complex than necessarycomplex than necessary

T i i l id d ti t• Training error no longer provides a good estimate of how well the tree will perform on previously

dunseen records

• Need new ways for estimating generalization errors

Page 68: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Incorporating Model Complexity

• Rationale: Occam’s RazorGiven two models of similar generalization errors oneGiven two models of similar generalization errors, one should prefer the simpler model over the more complex model

A complex model has a greater chance of being fitted accidentally by errors in data

Th f h ld i l d d l l it hTherefore, one should include model complexity when evaluating a model

Page 69: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Minimum Description Length (MDL)

A?

B?0

Yes NoX yX1 1

X yX

A B

B?

C?

0

1

B1 B2

C1 C2

1 1X2 0X3 0X4 1

X1 ?X2 ?X3 ?

10X4 1… …Xn 1

X4 ?… …Xn ?

• Cost(Model,Data) = Cost(Data|Model) + Cost(Model)Cost is the number of bits needed for encoding

n ?

Cost is the number of bits needed for encoding.Search for the least costly model.

• Cost(Data|Model) encodes the misclassification errors• Cost(Data|Model) encodes the misclassification errors.• Cost(Model) uses node encoding (number of children)

plus splitting condition encoding.p p g g

Page 70: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

估计统计上界

• 使用训练误差的统计修正来估计泛化误差,因为泛化误差通常较大,因此统计修正经常是计算训练误差的误差通常较大,因此统计修正经常是计算训练误差的上界

• 标记:• 标记:e 节点的错误率

x 真实的错误率

keN

=x 真实的错误率

N 训练样本总数

k 被错分的样本总数

N

k 被错分的样本总数

α 是置信水平

是标准正态分布的标准化值z 是标准正态分布的标准化值/2zα

70

Page 71: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

估计统计上界

• N个样本,有k 个样本错分的概率符合二项式分布

态 布 似 项 布

( , ) (1 )k k N kNp k N C x x −= −

• 用正态分布可以近似二项分布

Nxμ2 (1 )

NxNx x

μ

σ

=

= −( , (1 ))k N Nx Nx x−

71

Page 72: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

估计统计上界

• 用正态分布可以近似二项分布

( , (1 ))k N Nx Nx x−( , (1 ))

(0,1)(1 )

k N Nx Nx xk Nx NNx x−

/2

(1 )Nx xk Nx zα

−≤ /2(1 )Nx x

Ne Nx z

α−

−≤ /2(1 )

zNx x α≤

72

Page 73: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

估计统计上界

/2Ne Nx z−

≤ /2

2 2 2 2/2 /2

(1 )

( ) (2 ) 0

zNx x

N z x Ne z x Ne

α

α α

≤−

+ − + + ≤/2 /2( ) ( )α α

• 解方程有:2 2

/2 /2(1 )z ze ee α α−+ + +/2 /2

/2 2

2/2

( )2 4( , , )

1upper

e zN N Ne N e

z

α αα

α

α+ + +

=+ /21

Nα+

73

Page 74: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Using Validation Set(确认集)

• Divide training data into two parts:Training set:Training set:

use for model building

Validation set:Validation set: use for estimating generalization errorNote: validation set is not the same as test set

• Drawback:Less data available for training

Page 75: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Handling Overfitting in Decision Tree

• Pre-Pruning (Early Stopping Rule)St th l ith b f it b f ll tStop the algorithm before it becomes a fully-grown treeTypical stopping conditions for a node:

St if ll i t b l t th lStop if all instances belong to the same classStop if all the attribute values are the same

More restrictive conditions:More restrictive conditions:Stop if number of instances is less than some user-specified thresholdStop if class distribution of instances are independent of the available features (e.g., using χ 2 test)Stop if expanding the current node does not improve impurityStop if expanding the current node does not improve impurity

measures (e.g., Gini or information gain).Stop if estimated generalization error falls below certain threshold

Page 76: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Handling Overfitting in Decision Tree

• Post-pruningGrow decision tree to its entiretyGrow decision tree to its entiretySubtree replacement

Trim the nodes of the decision tree in a bottom-up fashionTrim the nodes of the decision tree in a bottom up fashionIf generalization error improves after trimming, replace sub-tree by a leaf node Class label of leaf node is determined from majority class of instances in the sub-tree

Subtree raisingSubtree raisingReplace subtree with most frequently used branch

Page 77: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Example of Post-Pruning

Class = Yes 20

Training Error (Before splitting) = 10/30

Pessimistic error = (10 + 0 5)/30 = 10 5/30Class = Yes 20

Class = No 10

Error = 10/30

Pessimistic error (10 + 0.5)/30 10.5/30

Training Error (After splitting) = 9/30

Pessimistic error (After splitting)

A?

Error = 10/30 ess st c e o ( te sp tt g)

= (9 + 4 × 0.5)/30 = 11/30

PRUNE!A?

A1

A2 A3

A4

Class = Yes 8Class = No 4

Class = Yes 3Class = No 4

Class = Yes 4Class = No 1

Class = Yes 5Class = No 1Class No 4 Class No 4 Class No 1 Class No 1

Page 78: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Examples of Post-pruning

Page 79: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Handling Missing Attribute Values(缺失值)

• Missing values affect decision tree construction in three different ways:three different ways:

Affects how impurity measures are computedAffects how to distribute instance with missing value toAffects how to distribute instance with missing value to child nodesAffects how a test instance with missing value isAffects how a test instance with missing value is classified

Page 80: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Computing Impurity Measure(忽略)

Tid Refund Marital Status

Taxable Income Class

Before Splitting:Entropy(Parent)

1 Yes Single 125K No

2 No Married 100K No Class= Yes

Class = No

= -0.3 log(0.3)-(0.7)log(0.7) = 0.8813

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

= Yes = NoRefund=Yes 0 3 Refund=No 2 4

Ref nd ? 1 06 No Married 60K No

7 Yes Divorced 220K No

Refund=? 1 0

Split on Refund:

E t (R f d Y ) 08 No Single 85K Yes

9 No Married 75K No

10 ? Single 90K Yes

Entropy(Refund=Yes) = 0

Entropy(Refund=No) = -(2/6)log(2/6) – (4/6)log(4/6) = 0.918310 ? Single 90K Yes

10

( ) g( ) ( ) g( )

Entropy(Children) = 0.3 (0) + 0.6 (0.9183) = 0.551Missing

valueGain = 0.9 × (0.8813 – 0.551) = 0.3303

value

Page 81: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Distribute Instances(根据分布预测)

Tid Refund Marital Status

Taxable Income Class

Tid Refund Marital Taxable1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

Tid Refund MaritalStatus

TaxableIncome Class

10 ? Single 90K Yes 10 g

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

RefundYes No

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 N M i d 75K N

Class=Yes 2 + 6/9

Class=No 4

Class=Yes 0 + 3/9

Class=No 3

9 No Married 75K No10

RefundYes No

Probability that Refund=Yes is 3/9

Probability that Refund=No is 6/9

Class=Yes 0

Class=No 3

Cheat=Yes 2

Cheat=No 4

Assign record to the left child with weight = 3/9 and to the right child with weight = 6/9

g

Page 82: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classify Instances

Married Single Divorced TotalNew record:

Class=No 3 1 0 4

Cl Y 6/9 1 1 2 67

Tid Refund Marital Status

TaxableIncome Class

11 No ? 85K ?

Refund

Class=Yes 6/9 1 1 2.67

Total 3.67 2 1 6.67

10

MarStNO

Yes No

M i dSingle,

TaxInc NO

Marriedg ,Divorced

< 80K > 80K

Probability that Marital Status = Married is 3.67/6.67

Probability that Marital Status

YESNO

< 80K > 80K Probability that Marital Status ={Single,Divorced} is 3/6.67

Page 83: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Other Issues

• Data Fragmentation(数据碎片)

• Search Strategy(搜索策略)• Search Strategy(搜索策略)

• Expressiveness(可表达性)

复• Tree Replication(重复)

Page 84: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Data Fragmentation

• Number of instances gets smaller as you traverse down the treedown the tree

N b f i t t th l f d ld b t• Number of instances at the leaf nodes could be too small to make any statistically significant decision

Page 85: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Search Strategy

• Finding an optimal decision tree is NP-hard

• The algorithm presented so far uses a greedy, top-down recursive partitioning strategy to induce adown, recursive partitioning strategy to induce a reasonable solution

• Other strategies?Bottom upBottom-upBi-directional

Page 86: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Expressiveness

• Decision tree provides expressive representation for learning discrete-valued function

But they do not generalize well to certain types of Boolean functions

Example: parity function(奇偶函数):Example: parity function(奇偶函数): – Class = 1 if there is an even number of Boolean attributes with truth

value = TrueClass = 0 if there is an odd number of Boolean attributes with truth– Class = 0 if there is an odd number of Boolean attributes with truth value = True

For accurate modeling, must have a complete tree

• Not expressive enough for modeling continuous variablesParticularly when test condition involves only a singleParticularly when test condition involves only a single attribute at-a-time

Page 87: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Decision Boundary

• Border line between two neighboring regions of different classes isBorder line between two neighboring regions of different classes is known as decision boundary

• Decision boundary is parallel to axes because test condition involves a single attribute at a timea single attribute at-a-time

Page 88: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Oblique(斜) Decision Trees

x + y < 1

Class = + Class =

• Test condition may involve multiple attributes

• More expressive representation

Fi di ti l t t diti i t ti ll i• Finding optimal test condition is computationally expensive

Page 89: Classification: Basic Concepts, Decision Trees, and …staff.ustc.edu.cn/~qiliuql/files/DM2013/slide3.2[ch4].pdf · Classification: Basic Concepts, Decision Trees, and Model Evaluation

Tree Replication

P

Q R

S 0 1Q

0 1 S 0

0 1

• Same subtree appears in multiple branches• Same subtree appears in multiple branches