57
Classification: Alternative Techniques 第 5 第 第第第第

Data Mining Classification: Alternative Techniques 第 5 章 分類技術

Embed Size (px)

Citation preview

Page 1: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

Data Mining Classification: Alternative

Techniques

第 5 章 分類技術

Page 2: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

2

Bayes Classifier

1. 貝氏理論2. 利用貝氏理論分類3. 單純貝氏分類法( Naïve Bayes )4. 貝氏信念網路

( Bayesian belief network , BBN)

Page 3: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

3

某一事件發生的機率常受其他相關事件是否發生影響,某一事件發生的機率常受其他相關事件是否發生影響,稱為條件機率稱為條件機率 (conditional probability)(conditional probability) 記作記作 P P ((A A | | BB)) 。。某一事件發生的機率常受其他相關事件是否發生影響,某一事件發生的機率常受其他相關事件是否發生影響,稱為條件機率稱為條件機率 (conditional probability)(conditional probability) 記作記作 P P ((A A | | BB)) 。。

條件機率:條件機率:

或或

條件機率:條件機率:

或或

( )( | )

( )P A B

P A BP B

( )

( | )( )

P A BP A B

P B

條件機率 & 乘法律

AP

BAPABP

|

AP

BAPABP

|

乘法律乘法律 (multiplication law)(multiplication law) 用來計算兩事件交集的機率用來計算兩事件交集的機率

及及

乘法律乘法律 (multiplication law)(multiplication law) 用來計算兩事件交集的機率用來計算兩事件交集的機率

及及PP((AA ∩∩BB) = ) = PP((BB))PP((A A | | BB)) PP((AA ∩∩BB) = ) = PP((AA))PP((B B | | AA))

Page 4: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

4一般形式的貝氏定理

若事件若事件 BB11,B,B22,…,B,…,Bkk 構成樣本空間構成樣本空間 SS 的一組分割,的一組分割,且且 PP((BBii)≠0,)≠0,ii=1,2,…,=1,2,…,kk ;則對;則對 SS 中的任一事件中的任一事件 PP((AA)≠0)≠0 而言,而言,若事件若事件 BB11,B,B22,…,B,…,Bkk 構成樣本空間構成樣本空間 SS 的一組分割,的一組分割,且且 PP((BBii)≠0,)≠0,ii=1,2,…,=1,2,…,kk ;則對;則對 SS 中的任一事件中的任一事件 PP((AA)≠0)≠0 而言,而言,

1 1 2 2

( ) ( )( )

( ) ( ) ( ) ( ) ( ) ( )i i

ik k

P B P A BP B A

P B P A B P B P A B P B P A B

Page 5: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

5

(1) (2) (3) (4) (5)

事件 事前機率 條件機率 聯合機率 事後機率Bi P(Bi) P(A|Bi) P(Bi∩A) P(Bi|A)

B1 0.2 0.05 0.010 0.010/0.032=0.3125

B2 0.3 0.04 0.012 0.012/0.032=0.3750

B3 0.5 0.02 0.010 0.010/0.032=0.3125

1.00 P(A)=0.032       1.000

貝氏定理的列表分析貝氏定理的列表分析

一般形式的貝氏定理例題

Page 6: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

6

如果事件 如果事件 AA 發生的機率不受事件發生的機率不受事件 BB 的影響,稱事件的影響,稱事件 AA 和和BB 為獨立事件為獨立事件 (independent events)(independent events) 。。如果事件 如果事件 AA 發生的機率不受事件發生的機率不受事件 BB 的影響,稱事件的影響,稱事件 AA 和和BB 為獨立事件為獨立事件 (independent events)(independent events) 。。

兩事件 兩事件 AA 和 和 BB 是獨立事件,則是獨立事件,則

且且

兩事件 兩事件 AA 和 和 BB 是獨立事件,則是獨立事件,則

且且PP((AA||BB) = ) = PP((AA))

獨立事件 / 條件獨立事件條件獨立事件

條件獨立事件條件獨立事件 (Conditional independence)(Conditional independence) ::在在 MM 事件發生的狀況下,事件發生的狀況下,A,B,CA,B,C 事件發生的機率不受彼此的影響事件發生的機率不受彼此的影響

條件獨立事件條件獨立事件 (Conditional independence)(Conditional independence) ::在在 MM 事件發生的狀況下,事件發生的狀況下,A,B,CA,B,C 事件發生的機率不受彼此的影響事件發生的機率不受彼此的影響

PP((AA ∩∩BB) = ) = PP((AA))PP((BB))

PP((AA ∩ ∩ BB ∩ ∩ C | MC | M)) = P = P((A|MA|M)) P P((B|MB|M)) P P((C|MC|M))

Page 7: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

7Bayes theorem

貝氏理論( Bayes theorem ),它是一個從資料當中結合類別知識的方法。

A probabilistic framework for solving classification problems

Conditional Probability:

Bayes theorem:

)()()|(

)|(AP

CPCAPACP

)(),(

)|(

)(),(

)|(

CPCAP

CAP

APCAP

ACP

Page 8: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

8Example of Bayes Theorem

Given: – 醫生知道,腦膜炎有 50 %的機率會導致頸部僵硬 – 先驗機率:任何病人患有腦膜炎的機率是 P(M)=1 /

50000

– 先驗機率:有任何病人頸部僵硬的機率是 P(S)=1 / 20

後驗機率:如果病人頸部僵硬,該病人患腦膜炎的機率為何 P(M|S) ?

0002.020/150000/15.0

)()()|(

)|( SP

MPMSPSMP

Page 9: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

9Bayesian Classifiers

Consider each attribute and class label as random variables

Given a record with attributes (A1, A2,…,An)

– Goal is to predict class C

– Specifically, we want to find the value of C that maximizes P(C| A1, A2,…,An )

Can we estimate P(C| A1, A2,…,An ) directly from data?

Page 10: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

10Bayesian Classifiers

Approach:

– compute the posterior probability P(C | A1, A2, …, An) for all values of C using the Bayes theorem

– Choose value of C that maximizes P(C | A1, A2, …, An)

– Equivalent to choosing value of C that maximizes P(A1, A2, …, An|C) P(C)

How to estimate P(A1, A2, …, An | C )?

)()()|(

)|(21

21

21

n

n

n AAAPCPCAAAP

AAACP

Page 11: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

11Naïve Bayes Classifier

Assume independence among attributes Ai when

class is given:

– P(A1, A2, …, An |Cj) = P(A1| Cj) P(A2| Cj)… P(An| Cj)

– Can estimate P(Ai| Cj) for all Ai and Cj.

– New point is classified to Cj if P(Cj) P(Ai| Cj) is maximal.

Page 12: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

12How to Estimate Probabilities from Data?

Class: P(C) = Nc/N– e.g., P(No) = 7/10,

P(Yes) = 3/10

For discrete attributes:

P(Ai | Ck) = |Aik|/ Nck

– where |Aik| is number of instances having attribute Ai and belongs to class Ck

– Examples:

P(Status=Married|No) = 4/7P(Refund=Yes|Yes)=0

Page 13: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

13Example

P(O)=3/10 P(No|O)=1 P(Div|O)=1/3 P(Low|O)=1/3 P(X)=7/10 P(No|X)=4/7 P(Div|X)=1/7 P(Low|X)=3/7

W={No,Div,Low} Class(W)=O or X ?

Tid RefundMaritalStatus

TaxableIncome

Class

1 Yes Single Mid X

2 No Married Mid X

3 No Single Low X

4 Yes Married Mid X

5 No Divorced Mid O

6 No Married Low X

7 Yes Divorced High X

8 No Single Low O

9 No Married Low X

10 No Single Mid O

Page 14: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

14Example

P(O)=3/10 P(No|O)=1 P(Div|O)=1/3 P(Low|O)=1/3

Tid RefundMaritalStatus

TaxableIncome

Class

1 Yes Single Mid X

2 No Married Mid X

3 No Single Low X

4 Yes Married Mid X

5 No Divorced Mid O

6 No Married Low X

7 Yes Divorced High X

8 No Single Low O

9 No Married Low X

10 No Single Mid O

P(X)=7/10 P(No|X)=4/7 P(Div|X)=1/7 P(Low|X)=3/7

W={No,Div,Low} P(O|W)= P(No|O)P(Div|O)P(Low|O)P(O)/P(W)= 1/30/P(W) P(X|W)= P(No|X)P(Div|X)P(Low|X)P(X)/P(W)= 2/245/P(W) ∵P(O|W) > P(X|W) Class(W)=O∴

)(

)()|()|(

21

2121

n

nn AAAP

CPCAAAPAAACP

P(A1, A2, …, An |Cj) = P(A1| Cj) P(A2| Cj)… P(An| Cj)

Page 15: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

15Example of Naïve Bayes Classifier

  Class    

Give Birthmammal

s

non-mammal

s總計

no 1 12 13

yes 6 1 7

總計 7 13 20

Lay Eggsmammal

s

non-mammal

s總計

no 6 1 7

yes 1 12 13

總計 7 13 20

Can Flymammal

s

non-mammal

s總計

no 6 10 16

yes 1 3 4

總計 7 13 20

Live in Water

mammals

non-mammal

s總計

no 5 6 11

sometimes   4 4

yes 2 3 5

總計 7 13 20

Have Legsmammal

s

non-mammal

s總計

no 2 4 6

yes 5 9 14

總計 7 13 20

Name Give Birth Can Fly Live in Water Have Legs Class

human yes no no yes mammalspython no no no no non-mammalssalmon no no yes no non-mammalswhale yes no yes no mammalsfrog no no sometimes yes non-mammalskomodo no no no yes non-mammalsbat yes yes no yes mammalspigeon no yes no yes non-mammalscat yes no no yes mammalsleopard shark yes no yes no non-mammalsturtle no no sometimes yes non-mammalspenguin no no sometimes yes non-mammalsporcupine yes no no yes mammalseel no no yes no non-mammalssalamander no no sometimes yes non-mammalsgila monster no no no yes non-mammalsplatypus no no no yes mammalsowl no yes no yes non-mammalsdolphin yes no yes no mammalseagle no yes no yes non-mammals

Give Birth Can Fly Live in Water Have Legs Class

yes no yes no ?

Page 16: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

16Example of Naïve Bayes Classifier

Name Give Birth Can Fly Live in Water Have Legs Class

human yes no no yes mammalspython no no no no non-mammalssalmon no no yes no non-mammalswhale yes no yes no mammalsfrog no no sometimes yes non-mammalskomodo no no no yes non-mammalsbat yes yes no yes mammalspigeon no yes no yes non-mammalscat yes no no yes mammalsleopard shark yes no yes no non-mammalsturtle no no sometimes yes non-mammalspenguin no no sometimes yes non-mammalsporcupine yes no no yes mammalseel no no yes no non-mammalssalamander no no sometimes yes non-mammalsgila monster no no no yes non-mammalsplatypus no no no yes mammalsowl no yes no yes non-mammalsdolphin yes no yes no mammalseagle no yes no yes non-mammals

Give Birth Can Fly Live in Water Have Legs Class

yes no yes no ?

0027.02013

004.0)()|(

021.0207

06.0)()|(

0042.0134

133

1310

131

)|(

06.072

72

76

76

)|(

NPNAP

MPMAP

NAP

MAP

A: attributes

M: mammals

N: non-mammals

P(A|M)P(M) > P(A|N)P(N)

Mammals

Page 17: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

18How to Estimate Probabilities from Data?

Normal distribution:

– One for each (Ai,ci) pair

For (Income, Class=No):

– If Class=No μ= 110

σ2= 2975

2

2

2

)(

22

1)|( ij

ijiA

ij

ji ecAP

0072.0)2975(2

1)|120( )2975(2

)110120( 2

eNoIncomeP

Page 18: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

20Example

P(O)=4/10 P(Yes|O)=1/4 P(Single|O)=1/4 P(Mid|O)=1/4

Tid RefundMaritalStatus

TaxableIncome

Class

1 Yes Single High X

2 Yes Single High X

3 Yes Single High X

4 Yes Single High X

5 Yes Single High X

6 No Married High X

7 Yes Married Low O

8 No Single Low O

9 No Married Low O

10 No Married Mid O

P(X)=6/10 P(Yes|X)=5/6 P(Single|X)=5/6 P(Mid|X)=0

W={Yes,Single,Mid} P(O|W)= (4/10)(1/4)3/P(W)= (1/160)/P(W) P(X|W)= (6/10)(5/6)2*0/P(W)= 0/P(W) = 0 ∵P(O|W) > P(X|W) Class(W)=O∴

)(

)()|()|(

21

2121

n

nn AAAP

CPCAAAPAAACP

P(A1, A2, …, An |Cj) = P(A1| Cj) P(A2| Cj)… P(An| Cj)

Page 19: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

21Naïve Bayes Classifier

If one of the conditional probability is zero, then the entire expression becomes zero

Probability estimation:

mN

mpNCAP

cN

NCAP

N

NCAP

c

ici

c

ici

c

ici

)|(:estimate-m

1)|(:Laplace

)|( :Originalc: number of classes

p: prior probability

m: parameter

Page 20: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

22Example

P(O)=4/10 P(Yes|O)=1/4 P(Single|O)=1/4 P(Mid|O)=1/4

Tid RefundMaritalStatus

TaxableIncome

Class

1 Yes Single High X

2 Yes Single High X

3 Yes Single High X

4 Yes Single High X

5 Yes Single High X

6 No Married High X

7 Yes Married Low O

8 No Single Low O

9 No Married Low O

10 No Married Mid O

P(X)=6/10 P(Yes|X)=5/6 P(Single|X)=5/6 P(Mid|X)=0

m

mXMidP

mN

mpNCAP

XMidPcN

NCAP

XMidPN

NCAP

c

ici

c

ici

c

ici

9

1.00)|(,)|(:estimate-m

9

1

36

10)|(,

1)|(:Laplace

6

0)|(,)|( :Original

Page 21: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

23Laplace probability estimate

where k is the number of classes.

Problems with Laplace:Assumes all classes a priori equally likelyDegree of pruning depends on number of classes

kNnp C

C 1

Page 22: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

24m-estimate of probability

pC = ( nC + pCa m ) / ( N + m)

where:

pCa = a prior probability of class C

m is a non-negative parameter tuned

by expert

Page 23: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

25m-estimate

Important points: Takes into account prior probabilities Pruning not sensitive to number of classes Varying m: series of differently pruned

trees Choice of m depends on confidence in

data

Page 24: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

26m-estimate in pruning

Choice of m:

Low noise low m little pruning

High noise high m much pruning

Note: Using m-estimate is as if examples at

node were a random sample, which they are

not. Suitably adjusting m compensates for this.

Page 25: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

27貝氏信念網路 (Bayesian belief network , BBN)

Page 26: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

28貝氏信念網路 範例 2- 已知高血壓

1967.08033.01)|(

8033.05185./49.*85.)(

)()|()|(

5185.051.2.49..85)()|()(

HighBPNoHDP

HighBPP

YesHDPYesHDHighBPPHighBPYesHDP

HDPHDHighBPPHighBPP

Page 27: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

29貝氏信念網路 範例 3- 高血壓 ,健康飲食 ,規律運動

4138.05862.01),,|(

5862.075.*2.25.*85.

25.*85.

),|()|(

),|()|(

),|(),,(

),,|(

),,|(

YesEHealthyDHighBPNoHDP

YesEHealthyDHDPHDHighBPP

YesEHealthyDYesHDPYesHDHighBPP

YesEHealthyDYesHDPYesEHealthyDHighBPP

YesEHealthyDYesHDHighBPP

YesEHealthyDHighBPYesHDP

Page 28: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

30

Artificial Neural Networks (ANN)

Page 29: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

31Artificial Neural Networks (ANN)

X1 X2 X3 Y1 0 0 01 0 1 11 1 0 11 1 1 10 0 1 00 1 0 00 1 1 10 0 0 0

X1

X2

X3

Y

Black box

Output

Input

Output Y is 1 if at least two of the three inputs are equal to 1.

Page 30: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

32Artificial Neural Networks (ANN)

X1 X2 X3 Y1 0 0 01 0 1 11 1 0 11 1 1 10 0 1 00 1 0 00 1 1 10 0 0 0

X1

X2

X3

Y

Black box

0.3

0.3

0.3 t=0.4

Outputnode

Inputnodes

otherwise0

trueis if1)( where

)04.03.03.03.0( 321

zzI

XXXIY

Page 31: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

33Artificial Neural Networks (ANN)

Model is an assembly of inter-connected nodes and weighted links

Output node sums up each of its input value according to the weights of its links

Compare output node against some threshold t

X1

X2

X3

Y

Black box

w1

t

Outputnode

Inputnodes

w2

w3

)( tXwIYi

ii Perceptron Model

)( tXwsignYi

ii

or

Page 32: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

34General Structure of ANN

Activationfunction

g(Si )Si Oi

I1

I2

I3

wi1

wi2

wi3

Oi

Neuron iInput Output

threshold, t

InputLayer

HiddenLayer

OutputLayer

x1 x2 x3 x4 x5

y

Training ANN means learning the weights of the neurons

Page 33: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

35Algorithm for learning ANN

Initialize the weights (w0, w1, …, wk)

Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples

– Objective function:

– Find the weights wi’s that minimize the above objective function e.g., backpropagation algorithm (see lecture notes)

2),( i

iii XwfYE

Page 34: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

36Support Vector Machines

Find a linear hyperplane (decision boundary) that will separate the data

Page 35: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

37Support Vector Machines

One Possible Solution

B1

Page 36: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

38Support Vector Machines

Another possible solution

B2

Page 37: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

39Support Vector Machines

Other possible solutions

B2

Page 38: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

40Support Vector Machines

Which one is better? B1 or B2? How do you define better?

B1

B2

Page 39: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

41Support Vector Machines

Find hyperplane maximizes the margin => B1 is better than B2

B1

B2

b11

b12

b21

b22

margin

Page 40: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

42Support Vector Machines

B1

b11

b12

0 bxw

1 bxw 1 bxw

1bxw if1

1bxw if1)(

xf 2||||

2 Margin

w

Page 41: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

43Support Vector Machines

We want to maximize:

– Which is equivalent to minimizing:

– But subjected to the following constraints:

This is a constrained optimization problem– Numerical approaches to solve it (e.g., quadratic programming)

2||||

2 Margin

w

1bxw if1

1bxw if1)(

i

i

ixf

2

||||)(

2wwL

Page 42: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

44Support Vector Machines

What if the problem is not linearly separable?

Page 43: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

45Support Vector Machines

What if the problem is not linearly separable?

– Introduce slack variables Need to minimize:

Subject to:

ii

ii

1bxw if1

-1bxw if1)(

ixf

N

i

kiC

wwL

1

2

2

||||)(

Page 44: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

46Nonlinear Support Vector Machines

What if decision boundary is not linear?

Page 45: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

47Nonlinear Support Vector Machines

Transform data into higher dimensional space

Page 46: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

48Ensemble Methods

Construct a set of classifiers from the training data

Predict class label of previously unseen records by aggregating predictions made by multiple classifiers

Page 47: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

49General Idea

OriginalTraining data

....D1D2 Dt-1 Dt

D

Step 1:Create Multiple

Data Sets

C1 C2 Ct -1 Ct

Step 2:Build Multiple

Classifiers

C*Step 3:

CombineClassifiers

Page 48: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

50Why does it work?

Suppose there are 25 base classifiers

– Each classifier has error rate, = 0.35

– Assume classifiers are independent

– Probability that the ensemble classifier makes a wrong prediction:

25

13

25 06.0)1(25

i

ii

i

Page 49: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

51Examples of Ensemble Methods

How to generate an ensemble of classifiers?

– Bagging

– Boosting

Page 50: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

52Bagging

Sampling with replacement

Build classifier on each bootstrap sample

Each sample has probability (1 – 1/n)n of being selected

Original Data 1 2 3 4 5 6 7 8 9 10Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7

Page 51: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

53Boosting

An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records

– Initially, all N records are assigned equal weights

– Unlike bagging, weights may change at the end of boosting round

Page 52: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

54Boosting

Records that are wrongly classified will have their weights increased

Records that are classified correctly will have their weights decreased

Original Data 1 2 3 4 5 6 7 8 9 10Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

• Example 4 is hard to classify

• Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

Page 53: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

55Example: AdaBoost

Base classifiers: C1, C2, …, CT

Error rate:

Importance of a classifier:

N

jjjiji yxCw

N 1

)(1

i

ii

1ln

2

1

Page 54: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

56Example: AdaBoost

Weight update:

If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated

Classification:

factor ionnormalizat theis where

)( ifexp

)( ifexp)()1(

j

iij

iij

j

jij

i

Z

yxC

yxC

Z

ww

j

j

T

jjj

yyxCxC

1

)(maxarg)(*

Page 55: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

57

BoostingRound 1 + + + -- - - - - -

0.0094 0.0094 0.4623B1

= 1.9459

Illustrating AdaBoost

Data points for training

Initial weights for each data point

OriginalData + + + -- - - - + +

0.1 0.1 0.1

Page 56: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

58Illustrating AdaBoost

BoostingRound 1 + + + -- - - - - -

BoostingRound 2 - - - -- - - - + +

BoostingRound 3 + + + ++ + + + + +

Overall + + + -- - - - + +

0.0094 0.0094 0.4623

0.3037 0.0009 0.0422

0.0276 0.1819 0.0038

B1

B2

B3

= 1.9459

= 2.9323

= 3.8744

Page 57: Data Mining Classification: Alternative Techniques 第 5 章 分類技術

59

Rule-Based Classifier