17
CSC 8520 Spring 2013. Paula Matuszek CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

Embed Size (px)

DESCRIPTION

CSC 8520 Spring Paula Matuszek 3 A Training Set

Citation preview

Page 1: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek

DecisionTreeFirstDraft

Paula MatuszekSpring, 2013

1

Page 2: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 2

Decision Tree Induction• Very common machine learning and

data mining technique. • Given:

– Examples– Attributes– Goal (classification, typically)

• Pick “important” attribute: one which divides set cleanly.

• Recur with subsets not yet classified.

Page 3: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 3

A Training Set

Page 4: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 4

Expressiveness•Decision trees can express any function of the input

attributes.•E.g., for Boolean functions, truth table row → path to

leaf:

•Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic in x) but it probably won't generalize to new examples

•Prefer to find more compact decision trees

Page 5: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 5

ID3• A greedy algorithm for decision tree construction

originally developed by Ross Quinlan, 1987 • Top-down construction of decision tree by

recursively selecting “best attribute” to use at the current node in tree– Once attribute is selected, generate children nodes,

one for each possible value of selected attribute– Partition examples using possible values of attribute,

assign subsets of examples to appropriate child node– Repeat for each child node until all examples

associated with a node are either all positive or all negative

Page 6: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek

Best Attribute• What’s the best attribute to

choose? The one with the best information gain– If we choose Bar, we have

• no: 3 -, 3 + yes: 3 -, 3+– If we choose Hungry, we have

• no: 4-, 1 + yes: 1 -, 5+– Hungry has given us more information

about the correct classification.

6

Page 7: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 7

Textbook restaurant domain• Develop a decision tree to model the decision

a patron makes when deciding whether or not to wait for a table at a restaurant

• Two classes: wait, leave• Ten attributes: Alternative available? Bar in

restaurant? Is it Friday? Are we hungry? How full is the restaurant? How expensive? Is it raining? Do we have a reservation? What type of restaurant is it? What’s the purported waiting time?

• Training set of 12 examples• ~ 7000 possible cases

Page 8: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek

Thinking About It• What might you expect a decision

tree to have as the first question? The second?

8

Page 9: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 9

A Decision Tree from Introspection

Page 10: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 10

Choosing an attribute•Idea: a good attribute splits the

examples into subsets that are (ideally) "all positive" or "all negative"

•Patrons? is a better choice

Page 11: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 11

Learned Tree

Page 12: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 12

How well does it work?• Many case studies have shown that

decision trees are at least as accurate as human experts. – A study for diagnosing breast cancer had

humans correctly classifying the examples 65% of the time; the decision tree classified 72% correct

– British Petroleum designed a decision tree for gas-oil separation for offshore oil platforms that replaced an earlier rule-based expert system

– Cessna designed an airplane flight controller using 90,000 examples and 20 attributes per example

Page 13: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek

Pruning• With enough levels of a decision tree

we can always get the leaves to be 100% positive or negative

• But if we are down to one or two cases in each leaf we are probably overfitting

• Useful to prune leaves; stop when– we reach a certain level– we reach a small enough size leaf– our information gain is increasing too

slowly

13

Page 14: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 14

Strengths of Decision Trees• Strengths include

– Fast to learn and to use– Simple to implement– Can look at the tree and see what is going on --

relatively “white box”– Empirically valid in many commercial products– Handles noisy data (with pruning)

• C4.5 and C5.0 are extension of ID3 that account for unavailable values, continuous attribute value ranges, pruning of decision trees, rule derivation.

Page 15: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 15

Weaknesses and Issues• Weaknesses include:

– Univariate splits/partitioning (one attribute at a time) limits types of possible trees

– Large decision trees may be hard to understand

– Requires fixed-length feature vectors – Non-incremental (i.e., batch method)– Overfitting

Page 16: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek

Decision Tree Architecture• Knowledge Base: the decision tree

itself. • Performer: tree walker• Critic: actual outcome in training

case• Learner: ID3 or its variants

– This is an example of a large class of learners that need all of the examples at once in order to learn. Batch, not incremental.

16

Page 17: CSC 8520 Spring 2013. Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring, 2013 1

CSC 8520 Spring 2013. Paula MatuszekCSC 8520 Spring 2013. Paula Matuszek 17

Summary: Decision tree learning• One of most widely used learning

methods in practice • Can out-perform human experts in

many problems