15
Machine Learning II 부부부부부 부부부부부부부부부부 부부부부부부부 부부부 ([email protected])

Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 ([email protected])

Embed Size (px)

Citation preview

Page 1: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)

Machine Learning II

부산대학교 전자전기컴퓨터공학과인공지능연구실

김민호([email protected])

Page 2: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)

Bayes’ Rule

• Please answer the following question on probability.• Suppose one is interested in a rare syntactic construction, perhaps parasitic

gaps, which occurs on average once in 100,000 sentences. Joe Linguist has developed a complicated pattern matcher that attempts to identify sen-tences with parasitic gaps. It’s pretty good, it’s not perfect: if a sentence has a parasitic gap, it will say so with probability 0.95, if it doesn’t, it will wrongly say it does with probability 0.005. Suppose the test say that a sen-tence contains a parasitic gap. What the probability that this is true?

• Sol)• G : the event of the sentence having a parasitic gap• T: the event of the test being positive

𝑃 (𝐺|𝑇 )=𝑃 (𝑇|𝐺 ) 𝑃 (𝐺)

𝑃 (𝑇|𝐺 ) 𝑃 (𝐺 )+𝑃 (𝑇|¬𝐺 ) 𝑃 (¬𝐺)

¿0.95×0.00001

0.95×0.00001+0.005×0.9999≈0.002

Page 3: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)

Naïve Bayes- Introduction• Simple probabilistic classifiers based on applying

Bayes' theorem• Strong (naive) independence assumptions between

the features

𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑦 ( 𝑓 1 ,…, 𝑓 𝑛)=𝑎𝑟𝑔max𝑐𝑝 (𝐶=𝑐∨ 𝑓 1 ,…, 𝑓 𝑛¿)¿

Page 4: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)

Naïve Bayes – Train & Test(Classification)

train

𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑦 (𝑋 11)=𝑎𝑟 gmax𝑐

𝑝 (𝐶=𝑐¿)∏𝑖=1

𝑛

𝑝 (𝐹𝑖= 𝑓 𝑖∨𝐶=𝑐)¿

𝑝 (𝐶=𝑇 )×𝑝 ( 𝐴𝑙𝑡=𝐹|𝐶=𝑇 )×𝑝 (𝐵𝑎𝑟=𝐹|𝐶=𝑇 )×⋯×𝑝 (𝐸𝑠𝑡=0−10∨𝐶=𝑇 )

𝑝 (𝐶=𝐹 )×𝑝 ( 𝐴𝑙𝑡=𝐹|𝐶=𝐹 )×𝑝 (𝐵𝑎𝑟=𝐹|𝐶=𝐹 )×⋯×𝑝 (𝐸𝑠𝑡=0−10∨𝐶=𝐹 )

test

Page 5: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)
Page 6: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)
Page 7: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)

Naïve Bayes Examples

𝑝 (𝐶=𝑇 )×𝑝 ( 𝐴𝑙𝑡=𝐹|𝐶=𝑇 )×𝑝 (𝐵𝑎𝑟=𝐹|𝐶=𝑇 )×⋯×𝑝 (𝐸𝑠𝑡=0−10|𝐶=𝑇 )𝑝 (𝐶=𝐹 )×𝑝 ( 𝐴𝑙𝑡=𝐹|𝐶=𝐹 )×𝑝 (𝐵𝑎𝑟=𝐹|𝐶=𝐹 )×⋯×𝑝 (𝐸𝑠𝑡=0−10|𝐶=𝐹 )∴𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑦 (𝑋 11 )=𝑎𝑟 gmax

𝑐𝑝 (𝐶=𝑐¿)∏

𝑖=1

𝑛

𝑝 (𝐹𝑖= 𝑓 𝑖|𝐶=𝑐 )=𝐹 ¿

Page 8: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)
Page 9: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)
Page 10: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)

Naïve Bayes Examples

𝑝 (𝐶=𝑇 )×𝑝 ( 𝐴𝑙𝑡=𝑇|𝐶=𝑇 )×𝑝 (𝐵𝑎𝑟=𝑇|𝐶=𝑇 )×⋯×𝑝 (𝐸𝑠𝑡=30−60|𝐶=𝑇 )𝑝 (𝐶=𝐹 )×𝑝 ( 𝐴𝑙𝑡=𝑇|𝐶=𝐹 )×𝑝 (𝐵𝑎𝑟=𝑇|𝐶=𝐹 )×⋯×𝑝 (𝐸𝑠𝑡=30−60|𝐶=𝐹 )∴𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑦 (𝑋 1 2 )=𝑎𝑟 gmax

𝑐𝑝(𝐶=𝑐¿)∏

𝑖=1

𝑛

𝑝 (𝐹 𝑖= 𝑓 𝑖|𝐶=𝑐 )=𝐹 ¿

Page 11: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)

Smoothing

• Zero probabilities cause a zero probability on the entire data• So….how do we estimate the likelihood of unseen

data? • Laplace smoothing• Add 1 to every type count to get an adjusted count c*

𝑃 (𝐴)=𝐶 (𝐴)𝑁

𝑃∗(𝐴)=𝐶 ( 𝐴 )+1𝑁+𝐵

𝑃 (𝐴∨𝐵)=𝐶 (𝐴 ,𝐵)𝐶 (𝐵)

𝑃∗(𝐴∨𝐵)=𝐶 ( 𝐴 ,𝐵 )+1𝐶 (𝐵 )+𝐵

Page 12: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)

Laplace Smoothing Exam-ples• Add 1 to every type count to get an adjusted count c*

WaitPat

True False

Some 4 0

Full 1 4

None 0 1

WaitPat

True False

Some 4+1=5 0+1=1

Full 1+1=2 4+1=5

None 0+1+1 1+1=2

Page 13: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)

Decision Tree

• Flowchart-like structure• Internal node represents test on an attribute• Branch represents outcome of test• Leaf node represents class label• Path from root to leaf represents classification rules

Page 14: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)

Information Gain

• • entropy of class distribution at a particular node

• conditional entropy = average entropy of conditional class distribution, after we have partitioned the data ac-cording to the values in A

• = • Simple rule in decision tree learning• At each internal node, split on the node with the largest

information gain (or equivalently, with smallest )

Page 15: Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)

Root Node Example

For the training set, 6 positives, 6 negatives, H(6/12, 6/12) = 1 bit

Consider the attributes Patrons and Type:

Patrons has the highest IG of all attributes and so is chosen by the learning algorithm as the root

Information gain is then repeatedly applied at internal nodes until all leaves contain only ex-amples from one class or the other

bits 0)]4

2,

4

2(

12

4)

4

2,

4

2(

12

4)

2

1,

2

1(

12

2)

2

1,

2

1(

12

2[1)(

bits 0541.)]6

4,

6

2(

12

6)0,1(

12

4)1,0(

12

2[1)(

HHHHTypeIG

HHHPatronsIG