컴퓨터 과학부 김명재.  Introduction  Data Preprocessing  Model Selection  Experiments

  • View
    215

  • Download
    2

Embed Size (px)

Text of 컴퓨터 과학부 김명재.  Introduction  Data Preprocessing  Model Selection ...

  • Slide 1

Slide 2 Introduction Data Preprocessing Model Selection Experiments Slide 3 Support Vector Machine Slide 4 SVM (Support vector machine) Training set of instance-label pairs where Objective function subject to Slide 5 Dual space form Objective function maximize subject to Slide 6 Nonlinear SVM Kernel method Training vectors Mapped into a higher dimensional space Maybe infinite Mapping function Objective function Slide 7 Kernel function Linear Polynomial Radial basis function Sigmoid are kernel parameter Slide 8 Example Data url http://www.csie.ntu.edu.tw/~cjlin/papers/guide/data/ http://www.csie.ntu.edu.tw/~cjlin/papers/guide/data/ Application#training data #testing data #features#classes Astroparticle3, 0894,00042 Bioinfomatics3910203 Vehicle1,24341212 Slide 9 Proposed Procedure Transform data to format of an SVM package Conduct simple scaling on the data Consider the RBF kernel Use cross-validation to find the best parameter and Use the best parameter and to train the whole training set Test Slide 10 Categorical Feature Example Three-category such as {red, green, blue} can be represented as (0, 0, 1), (0, 1, 0), and (1, 0, 0) Scaling Scaling before applying SVM is very important. Linearly scaling each attribute to the range [-1, +1] or [0, 1]. Slide 11 RBF kernel RBF kernel is a reasonable first choice Nonlinearly maps samples into a higher dimensional space The number of hyperparameters which influences the complexity of model selection. Fewer numerical difficulties Slide 12 Cross-validation Slide 13 Find the good Avoid the overfitting problem v-fold cross-validation Divide the training set into v subsets of equal size Sequentially, on subset is tested using the classifier trained on the remaining v-1 subsets. Slide 14 Grid-search Various pairs of Find a good parameter for example Slide 15 Grid-search Slide 16 Slide 17 Astroparticle Physics original accuracy 66.925 % after scaling 96.15 % after grid-search 96.875 % (3875/4000) Slide 18 Bioinformatics original cross validation accuracy 56.5217 % after scaling cross validation accuracy 78.5166 % after grid-search 85.1662 % Slide 19 Vehicle original accuracy 2.433902 % after scaling 12.1951 % after grid-searching 87.8049 % (36/41) Slide 20 libSVM http://www.csie.ntu.edu.tw/~cjlin/libsvm/ http://www.csie.ntu.edu.tw/~cjlin/libsvm/ A Training Algorithm for optimal Margin classifiers Bernhard E. Boser, Isabelle M. Guyon, Vladimir N. Vapnik Slide 21 end of pages