72
Support Vector Machines Konstantin Tretyakov ([email protected]) MTAT.03.227 Machine Learning

Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Embed Size (px)

Citation preview

Page 1: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Support Vector Machines

Konstantin Tretyakov ([email protected])

MTAT.03.227 Machine Learning

Page 2: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

So far…

Supervised machine learning

Linear models

Least squares regression

Fisher’s discriminant, Perceptron, Logistic model

Non-linear models

Neural networks, Decision trees, Association rules

Unsupervised machine learning

Clustering/EM, PCA

Generic scaffolding

Probabilistic modeling, ML/MAP estimation

Performance evaluation, Statistical learning theory

Linear algebra, Optimization methods

May 8, 2012

Page 3: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Coming up next

Supervised machine learning

Linear models

Least squares regression, SVM

Fisher’s discriminant, Perceptron, Logistic regression, SVM

Non-linear models

Neural networks, Decision trees, Association rules

SVM, Kernel-XXX

Unsupervised machine learning

Clustering/EM, PCA, Kernel-XXX

Generic scaffolding

Probabilistic modeling, ML/MAP estimation

Performance evaluation, Statistical learning theory

Linear algebra, Optimization methods

Kernels May 8, 2012

Page 4: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

First things first

SVM: (𝑦 ∈ {−1,1})

library('e1071')

m = svm(X, y, kernel='linear')

predict(m, newX)

May 8, 2012

Page 5: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Quiz

May 8, 2012

This line is called …

This vector is …

Those lines are …

𝑓 𝒙 = ?

𝒙𝟏 = ? 𝑦1 = ?

Functional margin of 𝒙𝟏?

Geometric margin of 𝒙𝟏?

Distance to origin?

Page 6: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Quiz

May 8, 2012

Separating hyperplane

Normal 𝒘

Isolines (level lines)

𝑓 𝒙 = 𝒘𝑻𝒙 + 𝑏

𝒙𝟏 = (2, 6); 𝑦1 = −1

𝑦1 ⋅ 𝑓 𝒙𝟏 ≈ 2

𝑓(𝒙𝟏)/|𝒘| ≈ 3√2

𝑑 = 𝑏/|𝒘|

Page 7: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Quiz

Suppose we scale 𝒘 and 𝑏 by some constant.

Will it:

Affect the separating hyperplane? How?

Affect the functional margins? How?

Affect the geometric margins? How?

May 8, 2012

Page 8: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Quiz

Example: 𝒘 → 2𝒘, 𝑏 = 0

May 8, 2012

Page 9: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Quiz

Suppose we scale 𝒘 and b by some constant.

Will it:

Affect the separating hyperplane? How?

No: 𝒘𝑇𝒙 + 𝑏 = 0 ⇔ 2𝒘𝑇𝒙 + 2𝑏 = 0

Affect the functional margins? How?

Yes: 2𝒘𝑇𝒙 + 2𝑏 𝑦 = 2 ⋅ 𝒘𝑇𝒙 + 𝑏 𝑦

Affect the geometric margins? How?

No: 2𝒘𝑇𝒙+2𝑏

|2𝒘|=

𝒘𝑇𝒙+𝑏

|𝒘|

May 8, 2012

Page 10: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Which classifier is best?

May 8, 2012

Page 11: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Maximal margin classifier

May 8, 2012

Page 12: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Why maximal margin?

Well-defined, single stable solution

Noise-tolerant

Small parameterization

(Fairly) efficient algorithms exist for finding it

May 8, 2012

Page 13: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Maximal margin: Separable case

May 8, 2012

𝑓 𝒙 = 1

𝑓 𝒙 = −1

Page 14: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Maximal margin: Separable case

May 8, 2012

𝑓 𝒙 = 1

𝑓 𝒙 = −1

∀𝑖 𝑓 𝒙𝑖 𝑦𝑖 ≥ 1

Page 15: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Maximal margin: Separable case

May 8, 2012

𝑓 𝒙 = 1

𝑓 𝒙 = −1

The (geometric)

distance to the

isoline 𝑓 𝒙 = 1 is:

Page 16: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Maximal margin: Separable case

May 8, 2012

𝑓 𝒙 = 1

𝑓 𝒙 = −1

The (geometric)

distance to the

isoline 𝑓 𝒙 = 1 is:

𝑑 =𝑓 𝒙

𝒘=

1

𝒘

Page 17: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Maximal margin: Separable case

Among all linear classifiers (𝒘, 𝑏)

… which keep all points at functional margin of

𝟏 or more,

… we shall look for the one which has the largest

distance 𝒅 to the corresponding isolines, i.e. the

largest geometric margin.

As 𝑑 =1

𝒘, this is equivalent to finding the classifier

with minimal |𝒘|.

…which is equivalent to finding the classifier with

minimal 𝒘 2

May 8, 2012

Page 18: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

May 8, 2012

Page 19: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

May 8, 2012

Page 20: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

May 8, 2012

Page 21: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

May 8, 2012

Page 22: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Compare

“Generic” linear classification (separable case):

Find (𝒘, b), such that all points are classified correctly

i.e. 𝑓 𝒙𝑖 𝑦𝑖 > 0

Maximal margin classification (separable case):

Find (𝒘, b), such that all points are classified correctly

with a fixed functional margin

i.e. 𝑓 𝒙𝑖 𝑦𝑖 > 𝟏

and 𝒘 𝟐 is minimal.

May 8, 2012

Page 23: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Remember

May 8, 2012

SVM optimization problem

(separable case):

min𝒘,𝑏

1

2𝒘 2

so that

𝒘𝑇𝒙𝑖 + 𝑏 𝑦𝑖 ≥ 1

Page 24: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

General case (“soft margin”)

The same, but we also penalize all margin

violations.

May 8, 2012

SVM optimization problem:

min𝒘,𝑏

1

2𝒘 2 + 𝐶 𝜉𝑖

𝑖

where

𝜉𝑖 = 1 − 𝑓 𝒙𝑖 𝑦𝑖 +

𝜉𝑖 = 1 − 𝑓 𝒙𝑖 𝑦𝑖 +

Page 25: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

General case (“soft margin”)

The same, but we also penalize all margin

violations.

May 8, 2012

SVM optimization problem:

min𝒘,𝑏

1

2𝒘 2 + 𝐶 1 − 𝑓 𝒙𝑖 𝑦𝑖 +

𝑖

𝜉𝑖 = 1 − 𝑓 𝒙𝑖 𝑦𝑖 +

Page 26: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

General case (“soft margin”)

The same, but we also penalize all margin

violations.

May 8, 2012

SVM optimization problem:

min𝒘,𝑏

1

2𝒘 2 + 𝐶 1 − 𝑚𝑖 +

𝑖

𝜉𝑖 = 1 − 𝑓 𝒙𝑖 𝑦𝑖 +

Page 27: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

General case (“soft margin”)

The same, but we also penalize all margin

violations.

May 8, 2012

SVM optimization problem:

min𝒘,𝑏

1

2𝒘 2 + 𝐶 hinge(𝑚𝑖)

𝑖

where

hinge 𝑚𝑖 = 1 − 𝑚𝑖 +

𝜉𝑖 = 1 − 𝑓 𝒙𝑖 𝑦𝑖 +

Page 28: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Hinge loss hinge 𝑚𝑖 = 1 − 𝑚𝑖 +

May 8, 2012

Page 29: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Classification loss functions

“Generic”

classification:

min𝒘,𝑏

[𝑚𝑖 < 0]

𝑖

May 8, 2012

Page 30: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Classification loss functions

Perceptron:

May 8, 2012

Page 31: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Classification loss functions

Perceptron:

min𝒘,𝑏

(−𝑚𝑖)+

𝑖

May 8, 2012

Page 32: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Classification loss functions

Least squares

classification*:

min𝒘,𝑏

𝑚𝑖 − 1 2

𝑖

May 8, 2012

Page 33: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Classification loss functions

Boosting:

min𝒘,𝑏

exp(−𝑚𝑖)

𝑖

May 8, 2012

Page 34: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Classification loss functions

Logistic regression:

min𝒘,𝑏

log (1 + 𝑒−𝑚𝑖)

𝑖

May 8, 2012

Page 35: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Classification loss functions

Regularized logistic

regression:

min𝒘,𝑏

log (1 + 𝑒−𝑚𝑖)

𝑖

+𝜆1

2𝒘 2

May 8, 2012

Page 36: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Classification loss functions

SVM:

min𝒘,𝑏

1 − 𝑚𝑖 +

𝑖

+1

2𝐶𝒘 2

May 8, 2012

Page 37: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Classification loss functions

L2-SVM:

min𝒘,𝑏

1 − 𝑚𝑖 +2

𝑖

+1

2𝐶𝒘 2

May 8, 2012

Page 38: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Classification loss functions

L1-regularized L2-SVM:

min𝒘,𝑏

1 − 𝑚𝑖 +2

𝑖

+ 1

2𝐶𝒘

… etc

May 8, 2012

Page 39: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

In general

min𝒘,𝑏

𝜙(𝑚𝑖)

𝑖

+ 𝜆 ⋅ Ω(𝒘)

May 8, 2012

Model fit Model complexity

Page 40: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Compare to MAP estimation

maxModel

log 𝑃(𝑥𝑖|Model)

𝑖

+ log 𝑃(Model)

May 8, 2012

Likelihood Model prior

Page 41: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Compare to MAP estimation

maxModel

log 𝑃(Data|Model) + log 𝑃(Model)

May 8, 2012

Likelihood Model prior

Page 42: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM

min𝒘,𝑏

1

2𝒘 2 + 𝐶 1 − 𝑓 𝒙𝑖 𝑦𝑖 +

𝑖

May 8, 2012

Page 43: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM

min𝒘,𝑏

1

2𝒘 2 + 𝐶 𝜉𝑖

𝑖

such that

𝑓 𝒙𝑖 𝑦𝑖 ≥ 1 − 𝜉𝑖

𝜉𝑖 ≥ 0

May 8, 2012

Page 44: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM

min𝒘,𝑏

1

2𝒘 2 + 𝐶 𝜉𝑖

𝑖

such that

𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0

𝜉𝑖 ≥ 0

May 8, 2012

Page 45: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM

min𝒘,𝑏

1

2𝒘 2 + 𝐶 𝜉𝑖

𝑖

such that

𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0

𝜉𝑖 ≥ 0

Quadratic function with linear constraints!

May 8, 2012

Page 46: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM

min𝒘,𝑏

1

2𝒘 2 + 𝐶 𝜉𝑖

𝑖

such that

𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0

𝜉𝑖 ≥ 0

Quadratic function with linear constraints!

May 8, 2012

Quadratic programming

Minimize

𝑓 𝒙 =1

2𝒙𝑇𝑸𝒙 + 𝒄𝑇𝒙

subject to:

𝑨𝒙 ≥ 𝒃

𝑪𝒙 = 𝒅

Page 47: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM

min𝒘,𝑏

1

2𝒘 2 + 𝐶 𝜉𝑖

𝑖

such that

𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0

𝜉𝑖 ≥ 0

Quadratic function with linear constraints!

May 8, 2012

Quadratic programming

Minimize

𝑓 𝒙 =1

2𝒙𝑇𝑸𝒙 + 𝒄𝑇𝒙

subject to:

𝑨𝒙 ≥ 𝒃

𝑪𝒙 = 𝒅

> library(quadprog)

> solve.QP(Q, -c, A, b, neq)

Page 48: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

min𝒘,𝑏

1

2𝒘 2 + 𝐶 𝜉𝑖𝑖 such that 𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0, 𝜉𝑖 ≥ 0

Is equivalent to:

min𝒘,b

max𝜶≥0,𝜷≥0

1

2𝒘 2 + 𝐶 𝜉𝑖

𝑖

− 𝛼𝑖(𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖)

𝑖

− 𝛽𝑖𝜉𝑖

𝑖

May 8, 2012

Page 49: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

min𝒘,𝑏

1

2𝒘 2 + 𝐶 𝜉𝑖𝑖 such that 𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0, 𝜉𝑖 ≥ 0

Is equivalent to:

min𝒘,b

max𝜶≥0,𝜷≥0

1

2𝒘 2 + 𝐶 𝜉𝑖

𝑖

− 𝛼𝑖(𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖)

𝑖

− 𝛽𝑖𝜉𝑖

𝑖

May 8, 2012

Page 50: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

min𝒘,𝑏

1

2𝒘 2 + 𝐶 𝜉𝑖𝑖 such that 𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0, 𝜉𝑖 ≥ 0

Is equivalent to:

min𝒘,b

max𝜶≥0,𝜷≥0

1

2𝒘 2 + 𝜉𝑖 𝐶 − 𝛼𝑖 − 𝛽𝑖

𝑖

− 𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1

𝑖

May 8, 2012

Page 51: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

min𝒘,𝑏

1

2𝒘 2 + 𝐶 𝜉𝑖𝑖 such that 𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0, 𝜉𝑖 ≥ 0

Is equivalent to:

min𝒘,b

max𝜶≥0,𝜷≥0

1

2𝒘 2 + 𝜉𝑖 𝐶 − 𝛼𝑖 − 𝛽𝑖

𝑖

− 𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1

𝑖

𝐶 − 𝛼𝑖 − 𝛽𝑖 = 0

May 8, 2012

Page 52: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

min𝒘,𝑏

1

2𝒘 2 + 𝐶 𝜉𝑖𝑖 such that 𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0, 𝜉𝑖 ≥ 0

Is equivalent to:

min𝒘,b

max𝜶≥0,𝜷≥0

1

2𝒘 2 + 𝜉𝑖 𝐶 − 𝛼𝑖 − 𝛽𝑖

𝑖

− 𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1

𝑖

0 ≤ 𝛼𝑖 ≤ 𝐶

May 8, 2012

Page 53: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

min𝒘,b

max𝜶

1

2𝒘 2

− 𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1

𝑖

0 ≤ 𝛼𝑖 ≤ 𝐶

May 8, 2012

Page 54: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

min𝒘,b

max𝜶

1

2𝒘 2

− 𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1

𝑖

0 ≤ 𝛼𝑖 ≤ 𝐶

Sparsity: 𝛼𝑖 is nonzero only for those points which

have

𝑓 𝒙𝑖 𝑦𝑖 − 1 < 0

May 8, 2012

Page 55: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

min𝒘,b

max𝜶

1

2𝒘 2

− 𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1

𝑖

0 ≤ 𝛼𝑖 ≤ 𝐶

Now swap the min and the max (can be done in

particular because everything is nice and convex).

May 8, 2012

Page 56: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

max𝜶

min𝒘,𝑏

1

2𝒘 2

− 𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1

𝑖

0 ≤ 𝛼𝑖 ≤ 𝐶

Next solve the inner (unconstrained) min as usual.

May 8, 2012

Page 57: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

max𝜶

min𝒘,𝑏

1

2𝒘 2

− 𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1

𝑖

0 ≤ 𝛼𝑖 ≤ 𝐶

Next solve the inner (unconstrained) min as usual:

∇𝒘= 𝒘 − 𝛼𝑖𝑦𝑖𝒙𝑖 = 0

∇𝑏= − 𝛼𝑖𝑦𝑖 = 0

May 8, 2012

Page 58: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

max𝜶

min𝒘,𝑏

1

2𝒘 2

− 𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1

𝑖

0 ≤ 𝛼𝑖 ≤ 𝐶

Express 𝒘 and substitute:

𝒘 = 𝛼𝑖𝑦𝑖𝒙𝑖

𝛼𝑖𝑦𝑖 = 0

May 8, 2012

Page 59: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

max𝜶

min𝒘,𝑏

1

2𝒘 2

− 𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1

𝑖

0 ≤ 𝛼𝑖 ≤ 𝐶

Express 𝒘 and substitute:

𝒘 = 𝛼𝑖𝑦𝑖𝒙𝑖

𝛼𝑖𝑦𝑖 = 0

May 8, 2012

Dual

representation

Page 60: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

max𝜶

min𝒘,𝑏

1

2𝒘 2

− 𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1

𝑖

0 ≤ 𝛼𝑖 ≤ 𝐶

Express 𝒘 and substitute:

max𝜶

𝛼𝑖

𝑖

−1

2 𝛼𝑖𝛼𝑗𝑦𝑖𝑦𝑗𝒙𝑖

𝑇𝒙𝑗

𝑖,𝑗

0 ≤ 𝛼𝑖 ≤ 𝐶

𝛼𝑖𝑦𝑖 = 0

𝑖

May 8, 2012

Page 61: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

max𝜶

𝛼𝑖

𝑖

−1

2 𝛼𝑖𝛼𝑗𝑦𝑖𝑦𝑗𝒙𝑖

𝑇𝒙𝑗

𝑖,𝑗

0 ≤ 𝛼𝑖 ≤ 𝐶

𝛼𝑖𝑦𝑖 = 0

𝑖

May 8, 2012

Page 62: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

max𝜶

𝟏𝑇𝜶 −1

2𝜶𝑇 𝑲 ∘ 𝒀 𝜶

0 ≤ 𝜶 ≤ 𝐶 𝒚𝑇𝜶 = 0

𝐾𝑖𝑗 = 𝒙𝑖𝑇𝒙𝑗, 𝑌𝑖𝑗 = 𝑦𝑖𝑦𝑗

May 8, 2012

Page 63: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Solving the SVM: Dual

min𝜶

1

2𝜶𝑇 𝑲 ∘ 𝒀 𝜶 − 𝟏𝑇𝜶

𝜶 ≥ 0

−𝜶 ≥ −𝐶

𝒚𝑇𝜶 = 0

Then find 𝑏 from the condition:

𝑓 𝒙𝑖 𝑦𝑖 = 1 if 0 < 𝛼𝑖 < 𝐶

May 8, 2012

Page 64: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

May 8, 2012

Support vectors

Page 65: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

May 8, 2012

C

C

0

0

0

0

0

0.5

0.5

1

Support vectors

𝛼𝑖𝑦𝑖 = 0

𝑖

0 ≤ 𝛼𝑖 ≤ 𝐶

Page 66: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Sparsity

The dual solution is often very sparse, this

allows to perform optimization efficiently

“Working set” approach.

May 8, 2012

Page 67: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Kernels

𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏

𝒘 = 𝛼𝑖𝑦𝑖𝒙𝑖

𝑓 𝒙 = 𝛼𝑖𝑦𝑖𝒙𝑖𝑇𝒙 + 𝑏

𝑓 𝒙 = 𝛼𝑖𝑦𝑖𝐾(𝒙𝑖 , 𝒙) + 𝑏

May 8, 2012

Page 68: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Kernels

𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏

𝒘 = 𝛼𝑖𝑦𝑖𝒙𝑖

𝑓 𝒙 = 𝛼𝑖𝑦𝑖𝒙𝑖𝑇𝒙 + 𝑏

𝑓 𝒙 = 𝛼𝑖𝑦𝑖𝐾(𝒙𝑖 , 𝒙) + 𝑏

May 8, 2012

Kernel function

Page 69: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏

𝒘 = 𝛼𝑖𝑦𝑖𝒙𝑖

𝑓 𝒙 = 𝛼𝑖𝑦𝑖𝒙𝑖𝑇𝒙 + 𝑏

𝑓 𝒙 = 𝛼𝑖𝑦𝑖𝐾(𝒙𝑖 , 𝒙) + 𝑏

Kernels

May 8, 2012

𝑓 𝑥 = 𝑤1𝑥 + 𝑤2𝑥2 + 𝑏

𝑓 𝒙 = 𝛼𝑖𝑦𝑖exp (−|𝒙𝑖 − 𝒙 𝟐) + 𝑏

Page 70: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Quiz

SVM is a __________ linear classifier.

Margin maximization can be achieved via

minimization of ______________.

SVM uses _____ loss and _______

regularization.

Besides hinge loss I also know ____ loss and

___ loss.

SVM in both primal and dual form is solved

using ________ programming.

May 8, 2012

Page 71: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

Quiz

In primal formulation we solve for parameter

vector ___. In dual formulation we solve for

___ instead.

_____ form of SVM is typically sparse.

Support vectors are those training points for

which _______.

The relation between primal and dual variables

is: ___= ______𝑖 .

A Kernel is a generalization of _____ product.

May 8, 2012

Page 72: Support Vector Machines - ut · Coming up next Supervised machine learning Linear models Least squares regression, SVM Fisher’s discriminant, Perceptron, Logistic regression, SVM

May 8, 2012