51
Support Vector Machine Easy Tutorial(I wish)

Support Vector Machine Tutorial 한국어

Embed Size (px)

DESCRIPTION

Support Vector Machine Tutorial 한국어

Citation preview

Page 1: Support Vector Machine Tutorial 한국어

Support Vector Machine

Easy Tutorial(I wish)

Page 2: Support Vector Machine Tutorial 한국어

bT )()( xwx t y(1)

Page 3: Support Vector Machine Tutorial 한국어

문제를 간단히 하기 위해서 몇 가지 가정

1, 어떤 함수를 통해feature space으로의 mapping 후 linear sepeable

2, lable은 , 1과 -1 두가지

Page 4: Support Vector Machine Tutorial 한국어

))1(())((

)()()(

)2(

w

xw

w

x

w

x

bt

seperablelinearassumeweyty

vectorfeature

T

n

nn

거리와의초평면과

bT )()( xwx t y(1)

Page 5: Support Vector Machine Tutorial 한국어

)}({' nnnn ysigntify xxww Perceptron algorithm

오분류(mistake)된 표본(sample)들을 가지고 초평면(hyperplane)의 법선(normal) 벡터를 조정해 나간다. 그래서 perceptron 알고리즘을 mistake driven 알고리즘이라고 한다.

Page 6: Support Vector Machine Tutorial 한국어

Perceptron, convergence

• Claim : perceptron algorithm indeed convergences in a finite of updates.

• Assume that

updatekaftervectorparamaterthek :)(

RfinitesomeandtallforRt |||| x

t

T

tyts x)(.0 *

Page 7: Support Vector Machine Tutorial 한국어

2

22

*2

*2*)(*)(

)(*)(

2

22)1(

22)1(

2)1(2)1(

2)1(2)(

)2(*

*)2(*

)2(*

)1(*

*)1(*)(*

||*||

||||1

||||||||||||||||||||

)()*,cos(1

...

||||

||||||||

||||)(2||||

||||||||

...

2)(

)()(

)()(

)(

)()()(

);('

Rkor

kR

k

kR

kk

kR

R

y

y

k

y

y

y

fyify

kk

kTk

k

t

k

tt

Tk

t

k

tt

kk

kT

t

T

t

kT

tt

kT

kT

t

T

t

kTkT

tttt

x

xx

x

x

x

x

xxConvergenc

e proof

Page 8: Support Vector Machine Tutorial 한국어

Support Vector, and Margin의 정의

빨간색 동그라미가 Support vector이고 support vector와 초평면(hyperplane)과의 수직거리를 margin이라고 한다.

Page 9: Support Vector Machine Tutorial 한국어

구하기파라메타하는대 로최마진을샘플의가까운가장초평면에

))((min1

maxarg)3(,

bt T

nnb

xwww

SVM의 동기(motivation)?, 직관(intuition )?

수많은 초평면들이 있다. 어느 초평면이 가장 일반화 오류(generalization error)가 적을까? Maximal margin classifier!!

Page 10: Support Vector Machine Tutorial 한국어

Linear

Classifiers f x

a

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Page 11: Support Vector Machine Tutorial 한국어

Linear

Classifiers f x

a

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Page 12: Support Vector Machine Tutorial 한국어

Linear

Classifiers f x

a

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Page 13: Support Vector Machine Tutorial 한국어

Linear

Classifiers f x

a

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

Any of these would be fine..

..but which is best?

Page 14: Support Vector Machine Tutorial 한국어
Page 15: Support Vector Machine Tutorial 한국어
Page 16: Support Vector Machine Tutorial 한국어
Page 17: Support Vector Machine Tutorial 한국어

SVM 아이디어의 수식화를 위한 전처리?

는파라메터구하기마진의역수를최소화하

미분은나중의편의를위해서을최소화

최소화을

최대화마진

거리와의초평면과

임을뜻함수식만족하는점이모든

를조정함수식이되도록과만족하는가장가까운점이표면에서

변함없다마진은해도리스케일링로

;2

1minarg

)(2

1;

2

1

_

_;

arg1

))1(())((

)()()(

)2(

,))(()5(

))(()4(

))(())(())((.

___,;))((

.

2

,

2

2

1

w

w

w

w

ww

xw

w

x

w

x

xw

xw

w

xw

w

xw

w

xw

www

xw

w b

T

n

nn

T

n

T

n

T

n

T

n

T

n

T

n

inmbt

seperablelinearassumeweyty

vectorfeature

seperablelinearbt

bbt

bt

k

bkt

k

kbkt

kbbkbt

(6)

1;

1;

Page 18: Support Vector Machine Tutorial 한국어

SVM 아이디어의 수식화

1 ))((2

1minarg

2

,

bttosubject T

nb

xwww

여기까지가 SVM 수식완성!!!!!! 이제 이 이 수식을 풀기만 하면 됨 최적화식은 우리가 고등학교 때 보았던 linear programming과 비슷함 Quadratic programming이라고 한다. 이 Quadratic programming 문제을 dual form으로 전환하기 풀기 위해서 Lagrange multipliers 개념을 도입한다.

Dual form으로 전환 이유

-> 커널 트릭을 위해,

위 수식은 convex 최적화 문제라는 것이다.

Convex , non convex 간단히 설명

->convex하면 좋은게 “미분해서 0” 기법으로 grobal minimum을 구할 수 있다는 것.

Page 19: Support Vector Machine Tutorial 한국어

.;2/1

01

02

02

)1(1),()

);(),(

0,0)()(),(

.)()()(0)(0

0)(

)(,);()(

);()()(

21

21

1

1

212122

다른방법으로풀기

조건만족

에수직이다는에평행하기때문에는

위에있다고가정모두

테일러시리즈

xx

xx

x

x

xxxxLex

KKTgfL

wheregfgggf

ggggif

g

gxgxg

ggg

T

T

T

x

xx

xxx

xxxεxε

xεxεx

xεxεx

thenlim

Lagrange multiplier

Page 20: Support Vector Machine Tutorial 한국어

SVM 수식의 dual form으로 전환(SVM의 마술을 위해)

N

n

nn

n

N

n

mnm

N

m

n

N

n

n

N

n

mnm

N

m

n

N

n

n

N

n

n

N

n

N

n

nn

N

n

T

nnmnm

N

m

n

N

n

nn

N

n

nnn

N

n

nn

N

n

nnn

N

n

nnn

N

n

T

nn

ta

a

kttaaattaaaL

abtatattaa),bL(

ta

ta

tab

),bL(

tata),bL(

bta),bL(

1

1 111 11

11 111

1

1

1

11

1

2

)9(;0)12(

)10(;0)11(

),(2

1

2

1)(

~

)(2

1,

)7()9(),8(

)7(;0)9(

)()8(

0,

)()8(0)(,

;))((2

1,)7(

번식의제약식

번식의제약식

에대입해서정리하면를

수식나오는정리하면미분해서식을

수식최적화라그랑쥬

mn

T

T

xx(x)(x)a

xw(x)(x)aw

xw

aw

xwxww

aw

xwwaw

1

이부분 이해를 위해 천천히

Page 21: Support Vector Machine Tutorial 한국어

1 ))((2

1minarg

2

,

bttosubject T

nb

xwww

수식최적화라그랑쥬 1 ;))((2

1,)7(

1

2

N

n

T

nn bta),bL( xwwaw

N

n

nnn

N

n

nn

N

n

nnn

N

n

nnn

ta

tab

),bL(

tata),bL(

1

1

11

)()8(

0,

)()8(0)(,

xw

aw

xwxww

aw

수식나오는정리하면미분해서식을 )7(;0)9(1

N

n

nnta

Page 22: Support Vector Machine Tutorial 한국어

수식최적화라그랑쥬 1 ;))((2

1,)7(

1

2

N

n

T

nn bta),bL( xwwaw

N

n

nnnta1

)()8( xw

수식나오는정리하면미분해서식을 )7(;0)9(1

N

n

nnta

N

n

nn

n

N

n

mnm

N

m

n

N

n

n

N

n

mnm

N

m

n

N

n

n

N

n

n

N

n

N

n

nn

N

n

T

nnmnmnm

N

m

n

ta

a

kttaaattaaaL

abtatakttaa),bL(

1

1 111 11

11 111

)10(;0)12(

)10(;0)11(

),(2

1

2

1)(

~

)(),(2

1,

)7()9(),8(

번식의제약식

번식의제약식

에대입해서정리하면를

mn

Txx(x)(x)a

xwxxaw

Page 23: Support Vector Machine Tutorial 한국어

SVM의 마술

.;!!!!!!!33310000000100000003).

..

)(.?

.

,)(

_)7()9(),8();,(2

1)(

~)10(

,)(

;))((2

1,)7(

3

1 11

3

1

2

없어진다차원은

한다트릭이라고커널이것을있다수할일을그계산으로적은

통해를함수커널그러나있다수할매핑을무한차원즉이라면하지만

좋아보인다안변환은듀얼폼으로의때문에이기보통

갯수의은복잡도계산필요한푸는데문제를최적화이

듀얼폼번식의식을이용해서얻은

차원의은복잡도계산필요한푸는데문제를최적화이

수식최적화라그랑쥬

xxxx(x)(x)

xxa

xwwaw

T

ge

kM

NM

featureNNOQP

kttaaaL

featureMMOQP

bta),bL(

N

n

mnmnm

N

m

n

N

n

n

N

n

T

nn 1

이부분 이해를 위해 천천히

한번 읽어보라고 한다

Page 24: Support Vector Machine Tutorial 한국어

SVM의 예측기

;),()y( (13)

)()8,()()(y (1)

N

1n

1

bkta

tab

nnn

N

nnnn

T

xxx

xwxwx

.것이다학습하는를과결국이란에서 balearningSVM n

N

n

nnn

N

n

mnm

N

m

n

N

n

n

taatosubject

kttaaaL

1

1 11

;0,0

),(2

1)(

~mn xxa

제공함수

하나가그중있다연구도

최적화하려는를그래서부분이다무거운가장계산중가

풀면된다로제약식을위의은

''

)(.

.

.

quadprogMATLAB

SMO

QPSVMQP

QPan

onoptimizatiminimalsequential

Page 25: Support Vector Machine Tutorial 한국어

.)13(

.,

.0

1)(0

;01)()16(

01)()15(

0)14(

)(

;)(2

1,)7(

1

2

있다수낼분류해식으로가지고집합만의이

있다수알를풀면를즉

것이다라는은이면의미는조건의이

만족한다조건을같은다음과수식은위의

수식최적화라그랑쥬

SV

SVQP

SVa

ytora

yta

yt

a

conditionKKT

yta),bL(

nn

nnn

nnn

nn

n

N

n

nn

x

x

x

x

xwaw

1

Page 26: Support Vector Machine Tutorial 한국어

b를 계산

Sm

nmm

Sn

n

S

mmmn

nnnnn

ktatN

b

bktat

ytbkta

),(1

)18(

;1),()17(

,1)(.),(

xx

xx

xxxx

N

SVm

N

1n

) y((13) 대입하면에을

Support vector만 가지고 분류평면을 만들어 낸다.

Page 27: Support Vector Machine Tutorial 한국어

Linear seperable 하지 않다면?

여태까지는 sample들이 고차원 mapping 함수를 통해서 linear seperable 가능하다고 가정하고 이론을 전개

시켜 나갔다.

하지만 실제 문제에서는 그러한 경우는 많지 않다.

그래서 데이터들이 linear

seperable하지 않을 때를 해결하기 위한 최적화 수식의

변화가 필요하다.

Page 28: Support Vector Machine Tutorial 한국어

KKTyta

yt

a

ytaCbL

C

C

Nnyt

ytslackdef

nn

n

n

nnn

nn

n

N

n

nn

N

n

nnnn

N

n

n

b

N

n

n

nnn

nnn

nnn

0)28(

0)27(

0)26(

0)1)(()25(

01)()24(

0)23(

1)(2

1),,,,()22(

2

1minarg

;2

1)21(

;1,;10,;0

.)5(;,...,1,1)(

;)(:.

111

2

2

,

2

1

x

x

xwμaw

w

w

x

x

w

똑같아짐이랑이면

서마진의역을최소화해서는페널티를가하면잘못분류되는샘플에대

분류가틀리게됨마진안에옳게분류

는이식으로바뀐다제약식

차이예측값과의값과타겟실제

(6)

(20)

Page 29: Support Vector Machine Tutorial 한국어

Linear seperable 하지 않다면?

))30((0)34(

00)26(0)23()31)(33(

);,(2

1)(

~

)()(2

1

)(2

1),,,,()32(

0)31(

00)30(

)(0)29(

1

1 11

111

2

11111

2

1

1

N

n

nn

nnnnn

N

n

mnmnm

N

m

n

N

n

n

N

n

nnn

N

n

n

N

n

nnn

N

n

nn

N

n

nn

N

n

n

N

n

nnn

N

n

n

nn

n

N

n

nn

N

n

nnn

ta

CaandaandCa

kttaaaL

aaCyta

aaytaCbL

CaL

tab

L

xtawL

xxa

xw

xwμaw

w

값미분한대해서파라메터에각

컨디션

Page 30: Support Vector Machine Tutorial 한국어
Page 31: Support Vector Machine Tutorial 한국어
Page 32: Support Vector Machine Tutorial 한국어

Example.

z

zzzzzzxxxxxx

zxzxzxzxzxzx

zxzx

T

T

•T

x

zxzx ,

2

221

2

121

2

221

2

121

2

2

2

22211

2

1

2

12211

2

2211

2

,2,,2,2,1,2,,2,2,1

2221

11)k(

kernelpolynomial

?.

)5,1(,)3,2(:

)4,2(,)5,1(:

2

1

분류하라이용해서을커널과다음점들을다음 SVM

w

w

tt

tt

Page 33: Support Vector Machine Tutorial 한국어

N

n

nnn

N

n

mnm

N

m

n

N

n

n

taatosubject

kttaaaL

1

1 11

;0,0

),(2

1)(

~mn xxa

원래는 QP로 풀어야되지만 간단한 문제이기 때문에 해석적으로

풀었다.lagrange

multiplier 예제에서

보여준 것처럼.

N

n

nnnta1

)()8( xw

Page 34: Support Vector Machine Tutorial 한국어

고차원으로 매핑전의

차원에서의 초평면을

구하면 포물선 형태의

분류 경계가 나온다.

Page 35: Support Vector Machine Tutorial 한국어
Page 36: Support Vector Machine Tutorial 한국어

레퍼런스

1,비숍책

2,MIT 오픈강의

http://ocw.mit.edu/OcwWeb/Ele

ctrical-Engineering-and-

Computer-Science/6-867Fall-

2006/LectureNotes/index.htm

3,수많은 인터넷 자료들?

Page 37: Support Vector Machine Tutorial 한국어

Fast Training of Support Vector Machines using

Sequential Minimal Optimization,

Platt, John.

Jung-Kyu Lee @ MLLAB

Page 38: Support Vector Machine Tutorial 한국어

SVM revisited • the (dual) optimization problem we covered in SVM class

• Solving this optimization problem for large-scale application by using quadratic programming is very computationally expensive.

• John Platt a colleague at Microsoft, proposed efficient algorithm called SMO (Sequential Minimal Optimization) to actually solve this optimization problem.

0

m1,...,i ,0s.t

.,2

1)(max

m

1i

i

1,1

ii

jijij

m

ji

i

m

i

i

y

C

xxyyW

a

a

aaaaa

Page 39: Support Vector Machine Tutorial 한국어

Digression: Coordinate

ascent(1/3)

• Consider trying to solve the following unconstrained optimization

problem.

• Coordinate ascent:

),...,,(max 21 mW aaaa

Page 40: Support Vector Machine Tutorial 한국어

Digression: Coordinate

ascent(2/3)

• How coordinate ascent work?

• We optimize a quadratic function.

• Minimum is (0,0)

• First, minimize this w.r.t α1

• Next, minimize this w.r.t α2

• With same argument, iterate until

convergence.

Page 41: Support Vector Machine Tutorial 한국어

Digression: Coordinate

ascent(3/3)

• Coordinate assent will usually take a lot more steps than other

iterative optimization algorithm such as gradient descent,

Newton’s method, conjugate gradient descent etc.

• However, there are many optimization problems for which it’s

particularly easy to fix all but one of the parameters and optimize

with respect to just that one parameter.

• It turns out that this will be true when we modify this algorithm to

solve the SVM optimization problem.

Page 42: Support Vector Machine Tutorial 한국어

SMO • Coordinate assent in its basic form does not work to apply SVM

dual optimization problem.

• As coordinate assent progress, we can’t change αi without

violating the constraint.

0

m1,...,i ,0s.t

.,2

1)(max

m

1i

i

1,1

ii

jijij

m

ji

i

m

i

i

y

C

xxyyW

a

a

aaaaa

)1},1,1{(

0

2

11

m

2i

11

m

2i

11

m

1i

yyyy

yy

y

ii

ii

ii

aa

aa

a

Page 43: Support Vector Machine Tutorial 한국어

Outline of SMO • The SMO (sequential minimal optimization ) algorithm, therefore,

instead of trying to change one α at a time, we will try to change

two α at a time to keep satisfying the constraints.

• The term ‘minimal’ refers to the fact that we’re choosing the

smallest number of α

Page 44: Support Vector Machine Tutorial 한국어

Convergence of SMO • To test for convergence of SMO algorithm, we can check whether

the KKT conditions are satisfied to within some tolerance

• The key reason that SMO is an efficient algorithm is that updating

αi, αj can be computed very efficiently.

1)(0

1)(

1)(0

bxwyCa

bxwyCa

bxwya

i

T

ii

i

T

ii

i

T

ii

Page 45: Support Vector Machine Tutorial 한국어

Detail of SMO

• Suppose we’ve decide to hold α3,…,αm , update α1, α2

aa

aaa

a

2211

m

3i

2211

m

1i

0

yy

yyy

y

ii

ii

Page 46: Support Vector Machine Tutorial 한국어

Detail of SMO

• We can thus picture the constraints on α1, α2 as follows

• α1 and α2 must lie within

the box [0,C]×[0,C] shown.

• α1 and α2 must lie on line :

• From these constraints, we know:

aa 2211 yy

HL 2a

Page 47: Support Vector Machine Tutorial 한국어

Detail of SMO

• From

• W(α) become

• This is a standard quadratic function.

1221

122

2

11

2211

2211

)(

)(

)(

yy

yyy

yy

yy

aa

aa

aa

aa

),...,,)((),...,,( 212221 mm yyWW aaaaaa

cba

2aa

Page 48: Support Vector Machine Tutorial 한국어

Detail of SMO

• From

• This is really easy to optimize by setting its derivative to zero.

• If you end up with a value outside,

you must clip your solution.

• That’ll give you the optimal solution

of this quadratic optimization

problem subject to your solution

satisfying this box constraint

and lying on this straight line.

cba

2aa

Page 49: Support Vector Machine Tutorial 한국어

Detail of SMO • In conclusion,

• We do this process very quickly, which makes the inner loop of

the SMO algorithm very efficient.

H

LifL

Lif

HifHunclippednew

unclippednew

unclippednew

unclippednew

new

,

2

,

2

,

2

,

2

2 a

a

a

a

a

1221 )( yyaa

Page 50: Support Vector Machine Tutorial 한국어

Conclusion • SMO

- has scalability for large scale data.

- makes SVM computationally inexpensive.

- can be implemented easily because it has simple training

structure (coordinate ascent).

Page 51: Support Vector Machine Tutorial 한국어

Reference

• [1] Platt, John. Fast Training of Support Vector Machines using Sequential

Minimal Optimization, in Advances in Kernel Methods – Support Vector

Learning, B. Scholkopf, C. Burges, A. Smola, eds., MIT Press (1998).

• [2] The Simplified SMO Algorithm , Machine Learning course @ Stanford

http://www.stanford.edu/class/cs229/materials/smo.pdf