MELJUN CORTES IBM SPSS SVM

7/30/2019 MELJUN CORTES IBM SPSS SVM

1/23

SVM Models(Support VectorMachine Models)

A Classification and Regression Technique

This contains my personal notes only thus, this is not

complete. Most of the contents were taken from the

training manual of IBM SPSS Modeler. Please refer to

the training manual for a complete discussion.


2/23

SVM:

A classification technique that is used to

predict either a categorical and continuous

outcome field.

It is suited to analyzed data with a large

number of predictor fields.


3/23

SVM:

It maps data into a dimensional space where the

data points can be categorized or predicted

accurately, even if there is no easy way to

separate the points in the original dimensionalspace.

It uses a kernel function to map the data from

the original space into the new space.

It does not provide output in the form of an

equation with coefficients on the predictor fields.


4/23

Assume that the X and Y

axis represent two

predictors, while the

circles and squares

represent the two

categories of a target field

we wish to predict.

There is no simple

straight line that can

separate the categories,but the curve drawn

around the squares

shows that there is a

complex curve that will

completely separate the

two categories.

SVM was developed to handle difficult

classification/prediction problems where

the simple linear models were unable to

accurately separate the categories of anoutcome field.


5/23

Central task of SVM:

Transform the data so that a hyperplane can be used to

separate the points. The mathematical function used for the transformation is

known as a kernel function.

The squares and circles can now be separated by a straight

line in this two-dimensional space.

Transformed DataOriginal Data


6/23

The filled-in circles and squares are thecases that are on the boundary between

the two classes.

The filled-in circles and squares are all the

data that are needed to separate the twocategories, and these key points are called

support vectors because they support the

solution and boundary definition.

Transformed Data

SVM models were developed in the

machine learning tradition, this technique

was called support vector machine, hence

the model name.


7/23

There is more than one

straight line (hyperplane) that

could be used to separate thetwo categories.

SVM models try to find the

best hyperplane that

maximizes the margin

(separation) between the

categories while balancing

the tradeoff of potentially

overfitting the data.

The narrower the margin

between the support vectors,

the more accurate the modelwill be on the current data.


8/23

Misclassified cases

Circles or squares can

fall on the wrong side of

the support vectors.

These are classified in

error.

SVM attempts to

maximize the margin

between the support

vectors while minimizing

error.


9/23

The mathematical function used for the

transformation is known as the kernel function.

SVM in Modeler supports the following kernel

types:

Kernel Function

Linear: Simple function that works well when nonlinear relationships

in the data are minimal

Polynomial:A more complex function that allows for higher order

terms

RBF (Radial Basis Function): Equivalent to the neural network of this

type. Can fit highly nonlinear data. Sigmoid: Equivalent to a two-layer neural network. Can also fit

highly nonlinear data.


10/23

Radial Basis Function

A radial basis function (RBF) is a real-valued function whose value depends

only on the distance from the origin, so that ; or alternatively onthe distance from some other point c, called a center, so

that . Any function that satisfies the

property is a radial function.
http://en.wikipedia.org/wiki/Origin_%28mathematics%29http://en.wikipedia.org/wiki/Origin_%28mathematics%29


11/23

Sigmoid Function

Many natural processes and complex system

learning curves display a history dependent

progression from small beginnings that

accelerates and approaches a climax over

time.For lack of complex descriptions a sigmoid

function is often used. A sigmoid curve is

produced by a mathematicalfunction having

an "S" shape. Often, sigmoid function refers

to the special case of the logistic function

shown at right and defined by the formula
http://en.wikipedia.org/wiki/Learning_curvehttp://en.wikipedia.org/wiki/Mathematicalhttp://en.wikipedia.org/wiki/Function_%28mathematics%29http://en.wikipedia.org/wiki/Logistic_functionhttp://en.wikipedia.org/wiki/Logistic_functionhttp://en.wikipedia.org/wiki/Function_%28mathematics%29http://en.wikipedia.org/wiki/Mathematicalhttp://en.wikipedia.org/wiki/Learning_curve


12/23

SVM Node Model Optio n

If a partition field is defined, this option ensuresthat data from only the training partition is used to

build the model.


13/23

SVM Node Expert Optio n

If selected, the probabilities for each

possible value of a set or flag target field

are displayed for each record processed by

the node.

If not selected, the probability of only the

predicted value is displayed for set or flagtarget fields.


14/23

SVM Node Expert Optio n Determines when to stop the optimization

algorithm.

Values range from 1.0E1 to 1.0E6;

default is 1.0E3.

Reducing the value results in a more

accurate model, but the model will take

longer to train.


15/23

Controls the trade-off betweenmaximizing the margin and minimizing the

training error.

Its values range from 1 to 10 (with 10 as

default)

Increasing the value improves the

classification accuracy (or reducesthe regression error) for the training

data


16/23

Epsilon is used when the target variable iscontinuous.

Errors in the model prediction is accepted

if they are under this value.

Increasing epsilon may result in faster

modeling, but at the expense of accuracy.


17/23

There are four kernel types:

Linear work well when nonlinearrelationships in the data are minimal.

Polynomial allows higher order terms

Radial Basis Function (RBF)Can fit

nonlinear data

Sigmoid S shape, special case of logistic

function


18/23

RBF gamma should normally

be between 3/k and 6/k, where

k=number of input fields.

increasing the value improves the

classification accuracy (or reduces theregression error).


19/23

Gamma is enabled only

when polynomial or sigmoid

is used..

Increasing the values

improves the classificationaccuracy (or reduces

regression error)


20/23

Biasis enabled only if

polynomial or sigmoid is

used. It sets the coefficient

value of the kernel function.


21/23

Degree is enabled if

polynomial is used. It is

used to control the

dimension of the mapping

space.


22/23

Target Variable: Loyal (leave/stay)

LONGDIST time spent for long distance calls per month

International - time spent for international calls per month

LOCAL time spent for local calls per month Dropped- number of dropped calls

Pay_mthd payment method of the monthly telephone bill

LocalBillType- tariff for locally based calls

LongdistanceBillType- tariff for long distance calls

Age Sex

Status marital status

Children number of children

Est_income estimated income

Car_owner- car owner

Example: Predicting Loyal Customers


23/23

Example 2: Predicting if customers will

accept a new cash-card offering.

Have a mortgage?

Have a life insurance?

Have a credit card?

Have a debit card?

Use mobile bank service?

Has a current account?

Has internet access to the account?

Has a personal loan?

Has savings? Has used a Cash Point in th last week?

Has hit the overdraft limit during last year?

Has an ISA account?

Age in years

How long as a customer?

Accept_CashCard - Accept the new cash card

Documents

MELJUN CORTES IBM SPSS SVM