Upload
lucas-xu
View
217
Download
2
Embed Size (px)
DESCRIPTION
rudimentary quick intro to SVM
Citation preview
Introduction to Support Vector Machine
Lucas Xu
September 4, 2012
Lucas Xu Introduction to Support Vector Machine September 4, 2012 1 / 20
1 Classifier
2 Hyper-Plane
3 Convex Optimization
4 Kernel
5 Application
Lucas Xu Introduction to Support Vector Machine September 4, 2012 2 / 20
Classifier
Attributes and Class Labels
Training Data
S ={
(x(1), y(1)), · · · , (x(m), y(m))}, x(i) ∈ Rd, y(i) ∈ {−1, 1}
Lucas Xu Introduction to Support Vector Machine September 4, 2012 3 / 20
Classifier
Umeng Gender Classification Data
user app1 app2 · · · appd genderuser1 1 0 · · · 0 maleuser2 0 1 · · · 1 female
......
.... . .
......
usern 1 1 · · · 1 female
Each App belongs to one category, ≈ 20 categories.
Categories are mutual exclusive.
Lucas Xu Introduction to Support Vector Machine September 4, 2012 4 / 20
Classifier
Umeng Gender Classification Data
S ={
(x(1), y(1)), · · · , (x(m), y(m))}, x(i) ∈ Rd, y(i) ∈ {−1, 1}
x(i)k ∈ {0, 1}, 0 means not installed, 1 means installed on the device
1 ≤ k ≤ d, d ' 30, 000, about 30,000 apps
y(i) ∈ {male, female}
Lucas Xu Introduction to Support Vector Machine September 4, 2012 5 / 20
Hyper-Plane
Figure : Hyper Plane
The hyper-plane: wTx+ b = 0Classification function: hw,b(x) = g(wTx+ b)
g(z) =
{1 if z ≥ 0−1 otherwise
Lucas Xu Introduction to Support Vector Machine September 4, 2012 6 / 20
Hyper-Plane
Functional Margin:γ̂(i) = y(i)(wTx(i) + b)
Scaling: set constraint normalization condition : ‖w‖ = 1Geometric Margin:
γ(i) = y(i)(( w
‖w‖
)Tx(i) +
b
‖w‖
)γ(i) should be a large positive number to increase the predictionconfidence.
Lucas Xu Introduction to Support Vector Machine September 4, 2012 7 / 20
Hyper-Plane
Definition
The geometry margin of (w, b) with respect to training dataset S:
γ = mini=1,...,m
γ(i)
Lucas Xu Introduction to Support Vector Machine September 4, 2012 8 / 20
Hyper-Plane
The optimal margin classifier: (Intuitive)find a decision boundary that maximizes the margin.
maxγ,w,b γ
s.t. y(i)(wTx(i) + b) ≥ γ, i = 1, ...,m
‖w‖ = 1.
Figure : Hyper Plane
How to solve?
Lucas Xu Introduction to Support Vector Machine September 4, 2012 9 / 20
Hyper-Plane
Normalization Constraint: let function margin γ̂ = 1
⇓
maxγ,w,b1
w
s.t. y(i)(wTx(i) + b) ≥ γ, i = 1, ...,m
⇓
maxw,b1
2‖w‖2
s.t. y(i)(wTx(i) + b) ≥ 1, i = 1, ...,m
Lucas Xu Introduction to Support Vector Machine September 4, 2012 10 / 20
Hyper-Plane
Convex function
Convex set
So-called Quadratic Programming. Their are many softwarepackages to solve the problem.
Basic Ideas for Support Vector Machine DONE !
More efficient solution ?
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
Hyper-Plane
Convex function
Convex set
So-called Quadratic Programming. Their are many softwarepackages to solve the problem.
Basic Ideas for Support Vector Machine DONE !
More efficient solution ?
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
Hyper-Plane
Convex function
Convex set
So-called Quadratic Programming. Their are many softwarepackages to solve the problem.
Basic Ideas for Support Vector Machine DONE !
More efficient solution ?
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
Hyper-Plane
Convex function
Convex set
So-called Quadratic Programming. Their are many softwarepackages to solve the problem.
Basic Ideas for Support Vector Machine DONE !
More efficient solution ?
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
Hyper-Plane
Convex function
Convex set
So-called Quadratic Programming. Their are many softwarepackages to solve the problem.
Basic Ideas for Support Vector Machine DONE !
More efficient solution ?
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
Convex Optimization
Primal Problem:
maxw,b1
2‖w‖2
s.t. y(i)(wTx(i) + b) ≥ 1, i = 1, ...,m
Lucas Xu Introduction to Support Vector Machine September 4, 2012 12 / 20
Convex Optimization
Lagrangian for the original problem:
minw,b
maxα:αi≥0
L(w, b, α) =1
2‖w‖2 −
m∑i=1
αi
[y(i)(wTx(i) + b)− 1
]⇓
Under K.K.T condition, transforms to its Dual problem:
maxα
W (α) =m∑i=1
αi −1
2
m∑i,j=1
y(i)y(j)αiαj〈x(i), x(j)〉
s.t. αi ≥ 0, i = 1, ...,mm∑i=1
αiy(i) = 0
Lucas Xu Introduction to Support Vector Machine September 4, 2012 13 / 20
Convex Optimization
Solutions:
w∗ =
m∑i=1
αiy(i)x(i)
b∗ = −maxi:y(i)=−1w
∗Tx(i) +mini:y(i)=1w∗Tx(i)
2
Predict:
g(x) = wTx+ b
=
( m∑i=1
αiy(i)x(i)
)Tx+ b
=
m∑i=1
αiy(i)〈x(i), x〉+ b
Lucas Xu Introduction to Support Vector Machine September 4, 2012 14 / 20
Kernel
For most of αi, αi = 0.
For those αi > 0, (x(i), y(i)) are called support vectors
Only needs to compute 〈x(i), x〉if we can map feature space (x
(i)1 , x
(i)2 , ...x
(i)k ) to another high
dimension space (z(i)1 , z
(i)2 , ...z
(i)l ), z = φ(x)
i.e. 〈φ(x(i), φ(x)〉we can easily compute 〈z(i), z〉 = K(φ(〈x(i), x〉))Use a slightly different notation:
K(x, y) = 〈φ(x), φ(y)〉
Intuitive Explanation: Measure of Similarities
Lucas Xu Introduction to Support Vector Machine September 4, 2012 15 / 20
Kernel
Definition
Mercer Kernel: K is positive semi-definite
Lucas Xu Introduction to Support Vector Machine September 4, 2012 16 / 20
Kernel
Primitive 〈x, y〉
Polynomial (〈x, y〉+ 1)d
RBF exp(−γ||x− y||2)Sigmoid tanh(κ〈x, y〉+ c).
String
Tree
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
Kernel
Primitive 〈x, y〉Polynomial (〈x, y〉+ 1)d
RBF exp(−γ||x− y||2)Sigmoid tanh(κ〈x, y〉+ c).
String
Tree
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
Kernel
Primitive 〈x, y〉Polynomial (〈x, y〉+ 1)d
RBF exp(−γ||x− y||2)
Sigmoid tanh(κ〈x, y〉+ c).
String
Tree
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
Kernel
Primitive 〈x, y〉Polynomial (〈x, y〉+ 1)d
RBF exp(−γ||x− y||2)Sigmoid tanh(κ〈x, y〉+ c).
String
Tree
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
Kernel
Primitive 〈x, y〉Polynomial (〈x, y〉+ 1)d
RBF exp(−γ||x− y||2)
Sigmoid tanh(κ〈x, y〉+ c).
String
Tree
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
Kernel
Primitive 〈x, y〉Polynomial (〈x, y〉+ 1)d
RBF exp(−γ||x− y||2)
Sigmoid tanh(κ〈x, y〉+ c).
String
Tree
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
Apply to Umeng Gender Classification
Problem DescriptionClassify the gender of a user based on apps (s)he installed andcategories of apps.
Kernel Design
K(x, y) =
m∑i,j=0
φ(xi, yj)
φ(xi, yj) =
(1 + w)xiyj if i = jxiyj if i 6= j but the same category0 if not the same category
w ≥ 0 , the extra weight if two users have installed the same app.default to 1.0
Experiment Result
Lucas Xu Introduction to Support Vector Machine September 4, 2012 18 / 20
Apply to Umeng Gender Classification
x1x2...xm
⇓
w · x1w · x2
...w · xmc1c2...c20
ci counts the number of apps belonging to category i
Lucas Xu Introduction to Support Vector Machine September 4, 2012 19 / 20
references
Book: Christopher Bishop – PRML Chapter 7: Section 7.1
Slides: Andrew Moore – Support Vector Machines
Video: Bernhard Scholkopf – Kernel Methods
Video: Liva Ralaivola – Introduction to Kernel Methods
Video: Colin Campbell – Introduction to Support Vector Machines
Video: Alex Smola – Kernel Methods and Support VectorMachines
Video: Partha Niyogi – Introduction to Kernel Methods
Many more videos on kernel-related topics here
http://www.seas.harvard.edu/courses/cs281/
Lucas Xu Introduction to Support Vector Machine September 4, 2012 20 / 20