Download pdf - ESM Mid term Review

온수당 머신러닝 11주차 갈무리

Mario Cho(조 만석)

[email protected]

장애물에도 굴하지 않는 한국인

Human Brain, Male, Female

Welcome to Machine Learing!

• Machine Learning 이란? • 구글에서 machine learning 이라 검색할 때, 그것과 관련된 것들을 찾아서 알려주는 것

• 페이스 북에서 친구나 회사, 모임을 추천하는 것

• 애플에서 iPhoto등으로 친구 얼굴을 인식해 주는 것.

• Siri, OK Google처럼 사람의 말을 듣고 그것에 따라 서비스를 제공하는 것

• 이메일에서 스팸이나 자동으로 분류해 주는 것

• the science of getting computers to learn, without being explicitly programmed.

Supervised vs. Un-Supervised • Supervised learning

• right answer given

• Regression: • predict continuous value output

• Ex) 면적이 주어지면 예상 되는 집값을 알려주는 서비스

• Classification: • predict discrete value output

• Ex) 유방 암의 종양의크기에 따라 악성/양성 인지 구분하는 서비스

• Logistic regression

• SVM: Support Vector Machine

• Un-Supervised Learning • Don’t have any label

• Clustering

• K-mean

Regression: Model Representation

• 주택 가격 예측

• 집의 크기에 따라 집의 가격이 결정

• 예측 값또한 real-valued output이다.

• 따라서 Supervised Learning 이며, regression 임.

• 크기가 1250 feet2 인 집의 예상 가격을 구해보자.

Regression: 가설(Hypothesis) 설정 • H: Hypothesis 학습 알고리즘에 따른 모델 정립

• 1변수 추정이라면 hθ(x) = θ0 + θ1(x)

• 다변수 추정이라면 더 복잡한 θ항이 나타난다.

X: input, size of house H: Hypothesis, Model representation Y: ouput: estimated house price

Regression: 가설(Hypothesis) 설정

Minimize Error

Hypothesis hθ(x) = θ0 + θ1(x)

1. Choose θ0, θ1

2. hθ(x) close to Y for training examples 3. ToDo: minimize θ0, θ1

Regression: Cost Function

• Cost Function J(θ0, θ1)

- Square error function

- Error function

- 실제 값과의 차이

▪ 선형 가설 모델 (Linear hypothesis regression)

-> Square error function 주로 사용

▪ 다변이나 비선형이라면 다른 오류 함수를 선택

Regression: Cost function simplified θ1

Regression: Cost function simplified θ1=1

Regression: Cost function simplified θ1=0.5

Regression: Cost function simplified θ1=1.5

Regression: Cost function calculate

• Cost Function J(θ0, θ1)

- Square error function

- Error function

- 실제 값과의 차이

Want: min J(θ0, θ1)

- θ0, θ1 을 각각 값을 변화하면서 최소값 J(θ0, θ1) 구한다. • 비용함수 (cost function)을 최소값이 오류가 가장 적은 가설이 된다.

• 모든 θ0, θ1 에 대하여 비용함수 J를 구할 수 있지만 시간이 많이 걸린다.

• 조금씩 θ 값들을 바꿔가면서 지역 최저점 (local minimum)을 구한다.

Hypothesis , Cost function

Regression: Cost function calculate

Learning rate

Regression: Gradient decent

• Gradient Decent(등고선 선도) 는 θ 값을 변화하면서 비용함수 J(cost function)을 3차원으로 그린 그림(Contour plot).

• Ex) J(θ0, θ1)

• bow shaped function 구해짐.

• gradient decent 로 θ0 추정


• gradient decent 로 θ1 추정


gradient descent

Normal equation

Gradient descent vs Normal Equation

Regression: summery

Regression

• 선형회기 방법을 사용하여 구해진 가설함수에 임의의 입력을 대입하면

• Input: 1250 -> output: 220k USD 가 예상 주택 가격이다.

• 모집단으로부터 유통기간이 같을 지라도 비타민 C 파괴량이 변한다는 것을 알 수 있다. 유통기간이 20 일 지났을 경우 비타민 C 파괴량은 15 mg 에서 35 mg 사이에 여러 값으로 변하면서 나타난다.

Ex) Regression

Classification

• Classification(군집화) • 입력 x에 대하여 이것이 어떤 곳(Y)가 될지 구분하는 방법이다.

• Example of Classification

• E-mail: Spam / Not Spam?

• 어떤 이메일이 스팸으로 구분할지? 아닌지(정상)?

• Online Trasactions: Fraudulent (Yes / No)?

• 온라인 쇼핑에서 사기가 의심되는지? (사기/정상)

• Tumor: Malignant (악성)/ Benign(음성)?

Classification : Logistic regression

Classification : Logistic regression

Classification Logistic regression

Classification: Logistic Regression

• Threshold classifier output hθ(x) at 0.5 • If hθ(x) >= 0.5, predict “y = 1, Malignant tumor”

• If hθ(x) < 0.5 , predict “y = 0, Benign tumor”

Logistic Regression: Hypothesis Representation

선형 회기 모델에 따른 가설 검증(예측 값)은

- 음성(0)과 양성(1)사이로 하였다.

• hθ(x) 는 0 =< hθ(x) =< 1 로

• 이것을 hθ(x) = θTx로 나타낼 수 있다.(normal Eq.)

Logistic Regression: Logistic function

• 가설 hθ(x) 를 logistic regression 모델을 위해서

• hθ(x) = 1

1+𝑒− 𝜃𝑇𝑥

• θTx -> z로 변환하면, g(θTx) 는 g(z) 로 변환된다.

• hθ(x) = g(z) = 1

1+𝑒−𝑧 , z <- θTx

• Function g(z): Logistic Function, Sigmoid Function

Logistic regression

Logistic Regression: Logistic function

• 가설 hθ(x) 는

입력 X에 대해서

결과 Y가 1(양성)이때 확률이라고 하면

.

• .

• hθ(x) 는 x(종양 크기, tumorSize) 일때

• 이 값이 0.7이라면 그 결과가 양성(1)인 확률이다.

• hθ(x) = 0.7 : 70%의 확률로 양성이다.

Logistic cost function

Logistic Regression: Decision Boundary

Multi-feature logistic regression

Regularization

Neural Network

X1 xnor X2

+1

x1

x2

+1

a1(2)

a2(2)

a1(3)

-30

20

20

10

-20

-10

-10

20

20

x1 x2 a1(2) a2(2) a1(3)

0 0 0 1 1

0 1 0 0 0

1 0 0 0 0

1 1 1 0 0

NN cost function

Forward propagation

backward propagation

Evaluation a Learning Algorithm

• Get more training examples • 학습할 데이터를 늘린다.

• 데이터를 늘린다고 해서 항상 도움이 되지는 않는다.

• 데이터 수를 늘릴지 아니면 다른 것으로 개선할지 결정하는 방법을 배워보자.

• Try smaller sets of features • 영향을 주는 요소를 줄여가 보자.

• 결과에 영향을 요소가 영향을 주는지 아닌지 아는 것은 어렵다.

• 어떨때 결과에 영향을 주는 핵심 요소만 뽑을 수 있는지 배워보자.

• Try getting additional features • 영향을 주는 요소가 부족할 수도 있다.

• 어떨때 결과에 영향을 주는 요소를 늘려가야만 하는지 배워보자.

• Try adding polynomial features (x1^2 , x2^2, x1*x2, ,,, etc) • 가설 함수에 고차항을 추가 하는 것이 도움이 될 수도 있다.

• 어떻 때 고차항을 추가하는 것이 도움이 될지 배워보자.

• Try decreasing LAMDA (λ) • Regularization 항을 계산하여 바꿔보자.

• Try increasing LAMDA (λ) • regularization 항을 계산하여 바꿔보자.

Evaluating a Hypothesis

• 수집한 데이터를 가지고

• 학습(training set) 과 시험(test set) 을 사용해야 할 것이다.

Model Selection: Train /Validation / Test Sets

• 어떻게 해야만 regularization parameter 고차항(polynomial) 차수를 줄일 수 있을까?

Model Selection: Train /Validation / Test Sets

Cost Functions • Cost Function: Training set error

• Cost Function: Cross Validation set error

• Cost Function: Test set error

Bias vs. Variance

High Bias Just Fix High Variance

High Bias vs. High Variance: cost function

Regularization and Bias/Variance

Regularization

Cost function

Cost function: regularization

Cost Function: regularization parameter

Learning curves

Learning curves: high bias

Learning curves: high variance

Deciding Waht to do Next Revisited • Get more training examples

• 학습할 데이터를 늘린다.

• 데이터를 늘린다고 해서 항상 도움이 되지는 않는다.

• high variance를 개선하는데 효과적이다.

• high bias에는 도움이 되지 않는다.

• Try smaller sets of features

• 영향을 주는 요소를 줄여가 보자.


• high bias에는 도움이 되지 않는다.

• Try getting additional features

• 영향을 주는 요소가 부족할 수도 있다.

• 어떨때 결과에 영향을 주는 요소를 늘려가는 것이다.

• 가설함수가 간단했기 때문에 요소를 추가하면 할수록 가설함수의 오류를 줄여 준다.

• high bias를 개선하는데 효과적이다.

• Try adding polynominal features (x1^2 , x2^2, x1*x2, ,,, etc)

• 가설 함수에 고차항을 추가 하는 것이다.


• Try decreasing LAMDA (λ)

• regularization 항의 영향을 줄이는 것이다.


• Try increasing LAMDA (λ)

• regularization 항을 영향을 늘리는 것이다.


Precision vs. Recall

Skewed Data

두가지를 잘 구분하는 경계선은 어떤것?

Support Vector Machine

y(x)=+1

y(x)=0

y(x)=-1

margin

=|𝑦 𝑥 |

|𝑤| Max로

하고 싶다!

• 두 부류를 가로지르는 경계조건 y(x)=0과 y(x) = +1, 혹은 y(x) = -1 인 경계 조건간의 간격(margin)을 최대화하는 것이 목적.

• y(x) = 0인 경계 조건에서 임의의 데이터 x 까지의 기하학적 거리(선형, 혹은

비선형)는 |𝑦 𝑥 |

|𝑤| 로 쓸 수

있다.

• 이 margin을 최대화 하는 y(x)=0인 경계 조건을 찾는 것이 SVM의 목적이다.

• 각 x에서 g(z) = QTx + b 로 쓸 수 있고.

• g(z) 를 각 x에 대하여 각각의 내적을 계산하여

• g(z) > +1 : 파란색

• g(z) < -1 : 귤색

y(x)=0 q

x1

x2

QTX2 < -1

x1

QTX > +1

Supervisor Learning vs. Non-Supervisor Learning

Un Supervisor Learning

K-means algorithm • Randomly initialize K cluster centroids u1, u2, … , uk

• Repeat {

for i = 1 to m

c(i) := index ( from 1 to K ) of cluster centroid

closest to x(i)

for k = 1 to K

uk := average (mean) of point assigned to cluster

}

Cluster Assignment

move centroid

K-means for non-separated cluster




Choosing the value of K





Ex) K-mean

[ 2.47027802 0.81094015]

[ 0.41000339 1.05879891] [ 0.48475999 -0.43215817]

Thanks you!

Q&A