Deep Learning - 서울시립대 데이터마이닝 연구실datamining.uos.ac.kr/wp-content/uploads/2016/09/deep... · 2016-10-04 · • training data를 training set과 validation

Deep Learning ch5. Machine Learning Basics (5.3~5.6)

2016. 10. 04 서울시립대학교 데이터마이닝 연구실

김소현

1

5.3 Hyperparameter and Validation Sets

• hyperparameter : machine learning algorithm을 조정하는 여러가지 setting

• linear regression :

• 특정 setting들은 최적화 하기 어렵기 때문에 hyperparameter로 선정되지 않는다.

hyperparameter


• 그러나, hyperparameter를 training set에 대해서만 조정한다면 overfitting 문제가 발생한다.

• 이를 해결하기 위해 validation set이 필요하다.

• validation set : training이 끝난 후 hyperparameter를 조정하는데 쓰임.

• test data는 성능 평가를 위해 쓰여야되기 때문에 hyperparameter를 조정하는데 쓰여서는 안된다.

• 반드시 validation set은 training data에서 추출한다.


• training data를 training set과 validation set으로 나눈다.

• training data -> training set : validation = 80% : 20%

• training set은 model을 생성하고 validation set은 model의 parameter을 update

• 성능 평가를 할 때는 매번 다른 test set를 사용하는 것이 좋다.

5.3.1 Cross-validation• 한정된 dataset을 고정된 training set과 test set으로 나누

면 test set이 너무 작아진다.

• 작은 test set은 정확한 성능 평가를 할 수 없다.

5.3.1 Cross-validation• holdout cross-validation

• data를 training set, validation set, test set으로 나눈다.

5.3.1 Cross-validation• k-fold cross validation

• k개의 training data set을 이용해서 k번 holdout method를 반복한다.

k=10 9 1

E : estimated performance(classification accuracy of error)

5.4 Estimators, Bias and Variance

• Machine Learning에서 유용하게 쓰이는 통계적 개념들

• parameter estimation

• bias

• variance

5.4.1 Point Estimation• estimate : 모수가 어떤 값이라고 추정하는 값

• estimator : 추정값을 추정하는 방법

• point estimation : 어떤 data set의 파라미터를 가장 잘 나타내는 통계량(평균, 중앙치…) 하나를 찾아내는 것

• 파라미터를 가장 잘 추청하는 적절한 값을 파라미터로 취함

• : point estimate of

5.4.1 Point Estimation• a set of m independent and identically distributed data

points (i.i.d)

• point estimator is a function of the data

• 의 값에 가까운 를 만들어 낼수록 좋은 estimator

• random process에서 도 random

5.4.1 Point Estimation• function estimation

• estimation of relationship between input and target variables

• predict y given x, we assume that there is a function f(x) that describes the approximate relationship x and y

• : function estimate of

5.4.2 Bias• bias of an estimator :

• 기대값과 true값의 차이

• unbiased :

• asymptotically unbiased

5.4.2 Bias• example : Bernoulli Distribution

• Bernoulli Distribution with mean :

• estimator of :

5.4.2 Bias

• Since , is unbiased estimator of Bernoulli mean parameter

5.4.2 Bias• example : Gaussian Distribution Estimator of the Mean

•

• , sample mean

•

• estimator is unbiased

5.4.2 Bias• example : Estimators od the Variance of a Gaussian

Distribution

• sample variance :

•

•

• unbiased sample variance :

5.4.3 Variance and Standard Error

• variance of an estimator :

• strandard error : , square root of variance

• variance와 standard error는 새로 생성한 dataset에 대해 estimate가 적용가능한지 대한 척도

• we like to have relatively low variance


• starndard error of the mean :

• SE는 신뢰 구간을 구할 때 사용할 수 있다.

• exmaple : normal distribution with mean , variance

• 95% confidence interval

• Algorithm A is better than algorithm BIf the upper bound of the 95% CI for the error of A is less than The lower bound of 95% CI for the error of B, A is better than B error of A error of B


• example : Bernoulli Distribution

•

• variance of estimator

•

• m(data의 양)이 많아 질 수록 variance가 낮아진다.

5.4.4 trading off Bias and Variance of Minimize Mean Squared Error

• Bias and variance measure two different source of error in an estimator

• Bias : expected deviation from the true value of the function of parameter

• Variance : deviation from the expected estimator value

• more bias vs more variance ?


• more bias vs more variance ?

• cross validation을 통해서 trade off를 따져본다.

• 또는 estimate들의 MSE(Mean squared error)를 비교한다.

•

• MSE는 Bias와 Variance를 모두 이용하여 전체적인 편차를 계산한다.


5.4.5 Consistency• Consistency (Weak Consistency): data set이 늘어날수록

estimate가 true parameter에 가까워진다.

• Strong consistency (Almost sure convergence) :

• data가 많아질수록 bias가 작아진다.

• asymptotically unbiased 가 consistency를 보장하지는 않는다.

5.5 Maximum Likelihood Estimation(MLE)

• MLE(최우추정법) : 데이터 집합이 관찰될 가능성(likelihood)을 죄대로 하는 파라미터를 추정치로 정하는 방법

•

• maximum likelihood estimator for

•

5.5 Maximum Likelihood Estimation(MLE)

• KL divergence :

• training set의 분포와 model의 분포의 차이를 줄이기 위해서는 를 줄이면 된다.

KL divergence (Kullback-Leibler divergence) : 두 확률분포의 차이를 계산하는 함수 두 학률변수에 대한 확률 분포

=minimize

5.5.1 Conditional Log-Likelihood and Mean Squared Error

• X : all inputs, Y : all observed targets

• conditional maximum likelihood estimator :

•

• if data is independently and identically distributed,

•

• basis for most supervised learning

5.5.1 Conditional Log-Likelihood and Mean Squared Error

• example : Linear Regression as Maximum Likelihood

• Instead of producing a single prediction , we think of the model as producing a conditional distribution

•

•

•

• Maximizing log likelihood = Minimizing MSE

5.5.2 Properties of Maximum Likelihood

• 아래의 조건을 만족할 경우, MLE은 consistency를 가진다.

• training data가 커질수록, MLE의 값이 실제 파라미터값에 수렴한다.

• 1. 실제 확률분포 의 확률 분포는 에 속해야된다.

• 2. 실재 확률분포 는 의 한 값과 정확하게 일치해야된다.

5.5.2 Properties of Maximum Likelihood

• statistical efficiency : lower generalization error

• A way to measure is expected mean squared error.

• Cramer-Rao lower bound가 MLE가 가장 작은 mean squared error를 가진 estimator라는 것을 증명했다.

• MLE는 특정 조건하에서 consistency를 가지고 statistical efficiency 가 좋기 때문에 좋은 estimator이다.

5.6 Bayesian Statistics• 베이지안 추론 : 사전분포와 우도함수를 이용해서 사후분포

를 구한다.

사전분포

사후분포

자료

30

5.6 Bayesian Statistics• 주어진 데이터 집합

• 우도함수(likelihood)

•

• Bayesian method는 training data가 적을 때는 효과적이지만 training data가 많아지면 cost가 매우 늘어난다.

31

5.6 Bayesian Statistics• example : Bayesian Linear Regression

• Linear Regression 에서 사전 분포가 주어지고 정규 분포를 따를 때 모델의 최적의 파라미터를 베이지안 방법을 이용해서 구할 수 있다.

•

• x: input vector, y: 예측값, scalar

•

32

5.6 Bayesian Statistics• y의 조건부 확률 분포로 가우시안 분포를 가정하면

normal (Gaussian) distribution

33

5.6 Bayesian Statistics•

•

•

•

34

5.6.1 Maximum A Posteriori (MAP)

• 최대 사후 확률 를 만드는 찾기

•

• ->>

35

log를 취하는 이유

Documents

Deep Learning - 서울시립대 데이터마이닝 연구실datamining.uos.ac.kr/wp-content/uploads/2016/09/deep... · 2016-10-04 · • training data를 training set과 validation