Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
𝑠𝑖𝑔𝑚𝑎 𝜶
Machine Learning
𝑠𝑖𝑔𝑚𝑎 𝜶
2015.06.20.
Logistic Regression
𝑠𝑖𝑔𝑚𝑎 𝜶 2
Linear Regression
• 임의의 데이터가 있을 때, 데이터 자질 간의 상관관계를 고려하는 것 수치형 목적 값 예측
친구 1 친구 2 친구 3 친구 4 친구 5
키 160 165 170 170 175
몸무게 50 50 55 50 60
𝑠𝑖𝑔𝑚𝑎 𝜶 3
Classification
• 데이터 자질 간의 상관관계를 고려하여 특정 대상으로분류하는 것
• 다른 예• Email: Spam / Not Spam
• Tumor: Malignant / Benign
• POS tag: Noun / Not Noun
• 𝑦 ∈ {0, 1}
친구 1 친구 2 친구 3 친구 4 친구 5
키 160 165 170 170 175
몸무게 50 50 55 50 60
이상형 X O O X O
0:𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝐶𝑙𝑎𝑠𝑠 𝑒. 𝑔. 𝑁𝑜𝑡 𝑁𝑜𝑢𝑛, 𝐵𝑒𝑛𝑖𝑔𝑛 𝑒𝑡𝑐.1: 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝐶𝑙𝑎𝑠𝑠 (𝑒. 𝑔. 𝑁𝑜𝑢𝑛,𝑀𝑎𝑙𝑖𝑔𝑛𝑎𝑛𝑡 𝑡𝑢𝑚𝑜𝑟 𝑒𝑡𝑐. )
𝑠𝑖𝑔𝑚𝑎 𝜶 4
Classification
이상형조건
(Yes) 1
(No) 0
Classification of linear regression- Incorrect classification
𝑠𝑖𝑔𝑚𝑎 𝜶 5
Classification
이상형 ?
Threshold classifier output at 0.5:
If , predict “y = 1”
If , predict “y = 0”
이상형조건
(Yes) 1
(No) 0
PositiveNegative
𝑠𝑖𝑔𝑚𝑎 𝜶 6
Classification
•Classification: y = 0 or 1
• ℎ𝜃 𝑥 can be > 1 or < 0
• Thus, denote range 0 ~ 1
• Logistic Regression: 0 ≤ ℎ𝜃 𝑥 ≤ 1
𝑠𝑖𝑔𝑚𝑎 𝜶 7
Logistic Regression
• Classification Problem We want: 0 ≤ ℎ𝜃 𝑥 ≤ 1
• Early Hypothesis: ℎ𝜃 𝑥 = 𝑤𝑇𝑦 + 𝑏
• Need transmutable function by the classification problem activation function
• Activation function: 𝑔 𝑧
• Resent Hypothesis: ℎ𝜃 𝑥 = 𝑔 𝑤𝑇𝑦 + 𝑏
• Sigmoid function 𝑔 𝑧 =1
1+𝑒−𝑧
• ℎ𝜃 𝑥 =1
1+𝑒−𝑤𝑇𝑥
𝑠𝑖𝑔𝑚𝑎 𝜶 8
Logistic Regression
𝑥𝑖
𝑥 𝑤
…
𝑥𝑖
𝑥 𝑤
…
Linear Regression Logistic Regression
𝑠𝑖𝑔𝑚𝑎 𝜶 9
Interpretation of Hypothesis Output
• ℎ𝜃 𝑥 = 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑦 = 1 𝑜𝑛 𝑖𝑛𝑝𝑢𝑡 𝑥
• 즉, 확률 값이 높은 것으로 분류
• Example:
• Conditional Probability likelihood (MLE)
• 입력 x와 파라미터 w로 y를 찾음
• ℎ𝜃 𝑥 = 𝑚𝑎𝑥𝑃 𝑦 = 1 𝑥; 𝑤)
• 𝑃 𝑦 = 1 𝑥; 𝑤) + 𝑃 𝑦 = 0 𝑥; 𝑤) = 1
• 𝑃 𝑦 = 0 𝑥; 𝑤) = 1 − 𝑃 𝑦 = 1 𝑥; 𝑤)
𝑖𝑓 𝑥 =𝑥0𝑥1=1
이상형ℎ𝜃 𝑥 = 0.75
75%가내이상형이될수있음
𝑠𝑖𝑔𝑚𝑎 𝜶 10
Logistic Regression Decision
• Hypothesis: ℎ𝜃 𝑥 = 𝑔(𝑤𝑇𝑥)
• Activation function: 𝑔 𝑧 =1
1+𝑒−𝑧
• Prediction
y=1 𝑖𝑓 ℎ𝜃 𝑥 ≥ 0.5y=0 𝑖𝑓 ℎ𝜃 𝑥 < 0.5
𝑠𝑖𝑔𝑚𝑎 𝜶 11
Decision Boundary
Andrew Ng
𝑠𝑖𝑔𝑚𝑎 𝜶 12
Non-linear decision boundaries
Andrew Ng
𝑠𝑖𝑔𝑚𝑎 𝜶 13
Cost Function
Training set:
How to choose parameters ?
𝑚 examples
𝑠𝑖𝑔𝑚𝑎 𝜶 14
Cost Function
• Linear regression: 𝐽(𝜃) =1
2 𝑖=1𝑚 𝑦𝑖 − ℎ𝜃(𝑥𝑖)
2
• Logistic regression (Negative Log Likelihood)
• 𝐶𝑜𝑠𝑡 𝑦𝑖 , ℎ𝜃 𝑥𝑖 =1
2𝑦𝑖 − ℎ𝜃 𝑥𝑖
2 NLL (MLE 때문)
“non-convex” “convex”
Sigmoid function
𝑠𝑖𝑔𝑚𝑎 𝜶 15
Cost Function
𝐶𝑜𝑠𝑡 ℎ𝜃 𝑥 , 𝑦 ={ − log ℎ𝜃 𝑥 , 𝑖𝑓 𝑦 = 1
− log 1 − ℎ𝜃 𝑥 , 𝑖𝑓 𝑦 = 0
𝑖𝑓 𝑦 = 1 𝑖𝑓 𝑦 =0
Negative Log Likelihood
𝑠𝑖𝑔𝑚𝑎 𝜶 16
Logistic regression cost function
• 𝐽 𝜃 = 𝐶𝑜𝑠𝑡 𝑦, ℎ𝜃 𝑥
• 𝑦 = 1: 𝐶𝑜𝑠𝑡 𝑦, ℎ𝜃 𝑥 = −log(ℎ𝜃 𝑥 )
• 𝑦 = 0: 𝐶𝑜𝑠𝑡 𝑦, ℎ𝜃 𝑥 = −log(1 − ℎ𝜃 𝑥 )
• 𝐽 𝜃 = 𝐶𝑜𝑠𝑡 𝑦, ℎ𝜃 𝑥
=- 𝑖=1𝑚 𝑦(𝑖) log ℎ𝜃 𝑥
𝑖 + 1 − 𝑦 𝑖 log 1 − ℎ𝜃 𝑥𝑖
• 파라미터 𝜃(= 𝑤) 최적화: min𝜃𝐽(𝜃)
• 입력 𝑥에 대한 분류: 𝑂𝑢𝑡𝑝𝑢𝑡 ℎ𝜃 𝑥 =1
1+𝑒−𝑤𝑇𝑥
𝐶𝑜𝑠𝑡 ℎ𝜃 𝑥 , 𝑦 = { − log ℎ𝜃 𝑥 , 𝑖𝑓 𝑦 = 1
− log 1 − ℎ𝜃 𝑥 , 𝑖𝑓 𝑦 = 0
→ 𝑃 𝑦 = 𝑐𝑙𝑎𝑠𝑠 𝑥; 𝜃)
𝑠𝑖𝑔𝑚𝑎 𝜶 17
Gradient Descent
• 𝐽 𝜃 = − 𝑖=1𝑚 𝑦(𝑖) log ℎ𝜃 𝑥
𝑖 + 1 − 𝑦 𝑖 log 1 − ℎ𝜃 𝑥𝑖
• Gradient descent min𝐽 𝜃
𝑅𝑒𝑝𝑒𝑎𝑡 {
𝜃𝑗 ≔ 𝜃𝑗 − 𝜂𝜕
𝜕𝜃𝑗𝐽(𝜃)
}
𝑠𝑖𝑔𝑚𝑎 𝜶 18
Gradient Descent
• 𝐽 𝜃 = − 𝑖=1𝑚 𝑦(𝑖) log ℎ𝜃 𝑥
𝑖 + 1 − 𝑦 𝑖 log 1 − ℎ𝜃 𝑥𝑖
• 각 조건부확률 대입해서 풀면 linear reg의 gd와 유사
• Gradient descent min𝐽 𝜃
𝑅𝑒𝑝𝑒𝑎𝑡 {
𝜃𝑗 ≔ 𝜃𝑗 − 𝜂
𝑖=1
𝑚
𝑦 𝑖 − ℎ𝜃 𝑥𝑖 𝑥𝑗
𝑖
}
𝑠𝑖𝑔𝑚𝑎 𝜶 19
Multiclass classification
• Examples
• POS tag: Noun, Verb, Pronoun, …
• Named Entity: OUT, PS_NAME, LC_COUNTRY, …
• Medical diagrams: Not ill, Cold, Flu, Mers
• Image recognition: Cat, Dog, Tiger, …
𝑠𝑖𝑔𝑚𝑎 𝜶 20
Multiclass classification
𝑠𝑖𝑔𝑚𝑎 𝜶 21
Multiclass classification
𝑠𝑖𝑔𝑚𝑎 𝜶 22
Multiclass classification
• 각 𝑐𝑙𝑎𝑠𝑠 𝑖의 확률(𝑦 = 𝑖)은 Logistic regression을 학습하여 구함
• 새로운 입력 x에 대하여 파라미터 연산 후, 가장 큰 확률의 class를 선택
maxℎ𝜃𝑖(𝑥)
𝑠𝑖𝑔𝑚𝑎 𝜶 23
References
• https://class.coursera.org/ml-007/lecture
• http://deepcumen.com/2015/04/linear-regression-2/