Support Vector Machine

Introduction to Support Vector Machine

Lucas Xu

September 4, 2012

Lucas Xu Introduction to Support Vector Machine September 4, 2012 1 / 20

1 Classifier

2 Hyper-Plane

3 Convex Optimization

4 Kernel

5 Application


Classifier

Attributes and Class Labels

Training Data

S ={

(x(1), y(1)), · · · , (x(m), y(m))}, x(i) ∈ Rd, y(i) ∈ {−1, 1}


Classifier

Umeng Gender Classification Data

user app1 app2 · · · appd genderuser1 1 0 · · · 0 maleuser2 0 1 · · · 1 female

......

.... . .

......

usern 1 1 · · · 1 female

Each App belongs to one category, ≈ 20 categories.

Categories are mutual exclusive.


Classifier

Umeng Gender Classification Data

S ={

(x(1), y(1)), · · · , (x(m), y(m))}, x(i) ∈ Rd, y(i) ∈ {−1, 1}

x(i)k ∈ {0, 1}, 0 means not installed, 1 means installed on the device

1 ≤ k ≤ d, d ' 30, 000, about 30,000 apps

y(i) ∈ {male, female}


Hyper-Plane

Figure : Hyper Plane

The hyper-plane: wTx+ b = 0Classification function: hw,b(x) = g(wTx+ b)

g(z) =

{1 if z ≥ 0−1 otherwise


Hyper-Plane

Functional Margin:γ̂(i) = y(i)(wTx(i) + b)

Scaling: set constraint normalization condition : ‖w‖ = 1Geometric Margin:

γ(i) = y(i)(( w

‖w‖

)Tx(i) +

b

‖w‖

)γ(i) should be a large positive number to increase the predictionconfidence.


Hyper-Plane

Definition

The geometry margin of (w, b) with respect to training dataset S:

γ = mini=1,...,m

γ(i)


Hyper-Plane

The optimal margin classifier: (Intuitive)find a decision boundary that maximizes the margin.

maxγ,w,b γ

s.t. y(i)(wTx(i) + b) ≥ γ, i = 1, ...,m

‖w‖ = 1.

Figure : Hyper Plane

How to solve?


Hyper-Plane

Normalization Constraint: let function margin γ̂ = 1

⇓

maxγ,w,b1

w

s.t. y(i)(wTx(i) + b) ≥ γ, i = 1, ...,m

⇓

maxw,b1

2‖w‖2

s.t. y(i)(wTx(i) + b) ≥ 1, i = 1, ...,m


Hyper-Plane

Convex function

Convex set

So-called Quadratic Programming. Their are many softwarepackages to solve the problem.

Basic Ideas for Support Vector Machine DONE !

More efficient solution ?


Hyper-Plane

Convex function

Convex set





Hyper-Plane

Convex function

Convex set





Hyper-Plane

Convex function

Convex set





Hyper-Plane

Convex function

Convex set





Convex Optimization

Primal Problem:

maxw,b1

2‖w‖2

s.t. y(i)(wTx(i) + b) ≥ 1, i = 1, ...,m


Convex Optimization

Lagrangian for the original problem:

minw,b

maxα:αi≥0

L(w, b, α) =1

2‖w‖2 −

m∑i=1

αi

[y(i)(wTx(i) + b)− 1

]⇓

Under K.K.T condition, transforms to its Dual problem:

maxα

W (α) =m∑i=1

αi −1

2

m∑i,j=1

y(i)y(j)αiαj〈x(i), x(j)〉

s.t. αi ≥ 0, i = 1, ...,mm∑i=1

αiy(i) = 0


Convex Optimization

Solutions:

w∗ =

m∑i=1

αiy(i)x(i)

b∗ = −maxi:y(i)=−1w

∗Tx(i) +mini:y(i)=1w∗Tx(i)

2

Predict:

g(x) = wTx+ b

=

( m∑i=1

αiy(i)x(i)

)Tx+ b

=

m∑i=1

αiy(i)〈x(i), x〉+ b


Kernel

For most of αi, αi = 0.

For those αi > 0, (x(i), y(i)) are called support vectors

Only needs to compute 〈x(i), x〉if we can map feature space (x

(i)1 , x

(i)2 , ...x

(i)k ) to another high

dimension space (z(i)1 , z

(i)2 , ...z

(i)l ), z = φ(x)

i.e. 〈φ(x(i), φ(x)〉we can easily compute 〈z(i), z〉 = K(φ(〈x(i), x〉))Use a slightly different notation:

K(x, y) = 〈φ(x), φ(y)〉

Intuitive Explanation: Measure of Similarities


Kernel

Definition

Mercer Kernel: K is positive semi-definite


Kernel

Primitive 〈x, y〉

Polynomial (〈x, y〉+ 1)d

RBF exp(−γ||x− y||2)Sigmoid tanh(κ〈x, y〉+ c).

String

Tree


Kernel

Primitive 〈x, y〉Polynomial (〈x, y〉+ 1)d


String

Tree


Kernel


RBF exp(−γ||x− y||2)

Sigmoid tanh(κ〈x, y〉+ c).

String

Tree


Kernel



String

Tree


Kernel




String

Tree


Kernel




String

Tree


Apply to Umeng Gender Classification

Problem DescriptionClassify the gender of a user based on apps (s)he installed andcategories of apps.

Kernel Design

K(x, y) =

m∑i,j=0

φ(xi, yj)

φ(xi, yj) =

(1 + w)xiyj if i = jxiyj if i 6= j but the same category0 if not the same category

w ≥ 0 , the extra weight if two users have installed the same app.default to 1.0

Experiment Result


Apply to Umeng Gender Classification

x1x2...xm

⇓

w · x1w · x2

...w · xmc1c2...c20

ci counts the number of apps belonging to category i


references

Book: Christopher Bishop – PRML Chapter 7: Section 7.1

Slides: Andrew Moore – Support Vector Machines

Video: Bernhard Scholkopf – Kernel Methods

Video: Liva Ralaivola – Introduction to Kernel Methods

Video: Colin Campbell – Introduction to Support Vector Machines

Video: Alex Smola – Kernel Methods and Support VectorMachines

Video: Partha Niyogi – Introduction to Kernel Methods

Many more videos on kernel-related topics here

http://www.seas.harvard.edu/courses/cs281/


http://www.seas.harvard.edu/courses/cs281/

Technology

Support Vector Machine