Stoufieugenia Thesis

Embed Size (px)

DESCRIPTION

Progress in digital data acquisition and storage technology have resulted in an increase of massive databases. The analysis of these databases to find unsuspecting relationships and to summarize the data in novel ways that are understandable and useful, is Data Mining. The main objectives of the DM is the prediction and description, one of the ways to achieve them is the classification, i.e. learning a function that classifies a data item into one of the several predefined classes. In this thesis, we theoretically analyze the Support vector machines of two or more classes, which form a new classification method. The primary goal, however, of DM is the extraction of new information from the data, which can be achieved by various techniques which are distinguished in the supervised and unsupervised methods. The first category consists of the stages of model selection and assessment, for these two stages has been developed a wide variety of methods and criteria. Objective and main theme of this thesis is the presentation of these criteria and methods and the introduction of new information Criteria for binary and multiclass SVMs and the experimental comparison of their performance. Finally, the experimental application of the methods of support vector classification problems in two and three classes is evaluated. More specifically:• The first chapter is an introduction to the basic concepts of Data Mining, for instance: its procedure, the data components, its methods, the components of its algorithms, its categories and types of learning and its applications.• In the second chapter, the theoretical background of binary SVMs is analyzed, and more specifically, the optimal separating hyperplane, the support vector classifier, the binary SVM, the kernels, the kernel trick, and finally the SVM as a penalization method.• In the third chapter, the binary SVM is applied to classification problems with more than two classes through the methods of one-against-all, one-against-one and the error-correcting output codes.• In the fourth chapter, a wide variety of criteria and methods used for the model selection and assessment, is analyzed, such as: AIC, BIC, cross validation, cross-validated error rate, evidence calculated by Laplace’s method, ξα-estimation, GRM, MDL and bootstrap. Moreover, their theoretical background is analyzed and the corresponding experimental results for their performance are presented.• In the fifth chapter the new information criteria for binary SVMs are presented, namely: RIC, KRIC, SVMICa and SVMICb, their theoretical background, some prerequisite concepts, the Nystrom approximation method for the calculation of the KRIC and the experimental comparison of their performance to the corresponding performance of the criteria of the fourth chapter through simulation patterns and tests on real data.• In the sixth chapter, two new information criteria for SVMs of more than two classes, which use the radius-margin bound for binary SVMs, some prerequisite concepts, an experimental comparison of their efficiency for model selection in multiclass SVMs, a comparison of the computation load to that of other methods and finally some experimental results on the benchmark datasets are presented.• In the Seventh Chapter, Statistical hypothesis testing and performance criteria of the model (Confusion Matrix) are analyzed. Moreover, the application of support vector methods to problems of two and three classes are experimentally evaluated.• Finally, in the eighth chapter a summary of the thesis is presented and some general conclusions are exported.

Citation preview

  • (INFORMATION CRITERIA FOR VARIABLE SELECTION IN

    SUPPORT VECTOR MACHINES AND APPLICATIONS)

    .

    :

    ...

    , 2015

  • 2

  • 3

    (INFORMATION CRITERIA FOR VARIABLE SELECTION IN

    SUPPORT VECTOR MACHINES AND APPLICATIONS)

    .

    :

    ...

    , 2015

  • 4

  • 5

    5

    11

    ABSTRACT.13

    15

    1: .17

    1.1 DATA MINING................................................................... 17

    1.2 ............................................................................. 18

    1.3 ............................................................................................. 19

    1.3.1 .................................................................................................................19

    1.3.2 ........................................................................................ 19

    1.4 ............................................................................... 20

    1.5 ......................................... 21

    1.6 DATA MINING- .......................................................... 22

    1.6.1 DATA MINING............................................................................... 22

    1.6.2 DATA MINING22

    1.7 DATA MINING................................................................................... 23

    2: (SVMS)25

    2.1 ......................................................................................................................... 25

    2.2 ......................................................................... 25

    2.3 ......................................................... 27

    2.4 .............................. 29

    2.5 SVM (KERNEL FUNCTIONS).......................................................................................................................... 31

    2.6 (KERNEL TRICK)............................................................................. 33

    2.7 SVMS ..................................................................... 33

  • 6

    3: (MULTI-CLASS SVMS)37

    3.1 ......................................................................................................................... 37

    3.2 ONE-AGAINST-ALL MULTI-CLASS SVM............................................................................ 37

    3.3 ONE-AGAINST-ONE MULTICLASS SVM........................................................................... 38

    3.4 MULTI-CLASS SVM............................................ 39

    3.5 ....................................................................................................................... 39

    3.6 ERROR-CORRECTING OUTPUT CODES............................................................................ 40

    4: .43

    4.1 ......................................................................................................................... 43

    4.2 ................................... 43

    4.3 -............................................................. 45

    4.4 ................................................. 46

    4.5 - H AKAIKE......................................................................................................... 47

    4.6 ......................................................................... 48

    4.7 BIC.......................................... 49

    4.8 ........................................................................................ 50

    4.8.1 ................................................................................................................... 50

    4.8.2 HOLD-OUT .............................................................................................. 51

    4.8.3 - .................................................................. 52

    4.8.4 10- ...................................................................52

    4.8.5 LEAVE-ONE-OUT (LOOCV)......................................... 54

    4.9 (CROSS VALIDATED ERROR RATE)..................................................................................................................................... 54

    4.9.1 ................................................................................................................... 54

    4.9.2 ........................ 55

    4.9.3 SVM ............... 55

    4.9.4 ............................... 57

    4.10 GACV ( ) 58

    4.10.1 KULLBACK-LEIBLER............................................................................... 58

    4.10.2 KULLBACK-LEIBLER (CKL DISTANCE)................................... 58

    4.10.3 KULLBACK-LEIBLER (GCKL DISTANCE)....................... 59

    4.10.4 PROXY GCKL- .......................................................................................................................... 59

    4.10.5 LEAVING-OUT-ONE ............................................................................. 60

    4.10.6

    .................................................................. 61

    4.10.7 GACV ........61

    4.10.8 ranGACV..................................................................................... 62

    4.11 LAPLACE.................................................. 63

    4.11.1 ................................................................................................................. 63

    4.11.2 SVM........................................ 63

    4.11.3 ...................................................................................................... 64

  • 7

    4.12 -................................................................................................................ 67

    4.12.1 ............................................................................................................... 67

    4.12.2 ....................................................................................... 67

    4.12.3 SVMS ...................................................68

    4.12.4 .......................................................................................... 68

    4.12.5 , F1.................................................................................. 69

    4.12.6 ............................................................................................................ 69

    4.13 (GUARANTEED RISK MINIMIZATION) ....................................................................................................................73

    4.13.1 ................................................................................................................. 73

    4.13.2 .................................................................................................................. 73

    4.13.3 (STRUCTURAL RISK MINIMIZATION) (TRUE RISK).............. 74

    4.13.4 VAPNIK- CHERVONENKIS........................................................................ 75

    4.13.5 (STRUCTURAL RISK MINIMIZATION). 75

    4.13.6 (EMPIRICAL RISK MINIMIZATION)...... 75

    4.13.7 GRM............................................................ 76

    4.13.8 ............................................................................................................. 76

    4.14 (UM DESCRIPTION LENGTH)........................ 78

    4.15 BOOTSTRAP................................................................................................. 79

    5: .81

    5.1 T .......................................................... 81

    5.1.1 ................................................................................................................... 81

    5.1.2 (KLR)..........................................................82

    5.1.3 SVMS KLR.............................. 83

    5.1.4 SOLLICH SVMS............................................... 83

    5.1.5 (RIC)........................................ 84

    5.1.6 (KRIC)....................... 85

    5.1.7 NYSTROM KRIC....... 88

    5.1.8 .............................................................................................................. 90

    5.1.9 ....................................................................................................... 95

    5.2 ............................................................................................ 97

    5.2.1 ................................................................................................................... 97

    5.2.2 ..................................................... 97

    5.2.3 ........................................................................ 98

    5.2.4 ...............................................................................99

    5.2.5 ......................................................................................... 102

    5.2.6 .................................................................... 110

    6: MULTICLASS SVMS115

    6.1 ......................................................................................................................... 115

  • 8

    6.2 ............................................................................................... 116

    6.2.1 (SINGLE-MACHINE APPROACH)............. 116

    6.2.2 (GRID-BASED SEARCH METHODS).116

    6.2.3 (GRADIENT-BASED OPTIMIZATION TECHNIQUES)............................................................................................... 117

    6.2.4 BROYDEN-FLETCHER-GOLDFARB-SHANNO (BFGS) QUASI-NEWTON ...... 117

    6.3 SVMS.................................................. 118

    6.4 .................................................................................. 119

    6.5 ................................................................................. 122

    6.6 ..................................................................................................... 125

    6.7 ..................................................................................... 126

    6.8 ................................................ 127

    6.9 ................. 129

    6.10 ........................................................................................................... 137

    7: ..139

    7.1 ......................................................................................................................... 139

    7.1.1 .......................................................................... 140

    7.1.2 -CONFUSION MATRIX.................................. 141

    7.2 R ............................................................... 143

    7.2.1 R............................................................................................................. 143

    7.2.2 ....................................................................................................... 144

    7.3 ........................................................................................................ 144

    7.3.1 CREDIT SCORING ........ 144

    7.3.2 (GERMAN CREDIT DATA).................................................................................................................................... 145

    7.3.3 (GERMAN CREDIT DATA) R............. 147

    7.4 ..................................................................................................... 148

    7.4.1 IRIS DATA SET................................................................. 148

    7.4.2 (IRIS DATA SET)...148

    7.4.3 (GERMAN CREDIT DATA) R .............149

    8: -.....151

    ................................................................................................................153

    .151

    : (5.1.7.2).................................................................157

    : (SEQUENTIAL MINIMAL OPTIMISATION(SMO)).......................................................................................................... 159

    :......160

    .1 GERMAN CREDIT DATA................................................................................................... 160

    .1.1 .................................................................................................................. 160

    .1.2 ............................................................................................ 161

    .1.3 ................................................................................................... 164

  • 9

    .2 IRIS DATA SET.................................................................................................................. 166

    .2.1 ................................................................................................................... 166

    .2.2 ............................................................................................ 167

    .2.3 ....................................................................................................171

  • 10

  • 11

    . , (Data Mining). D.M , , . , , . , , D.M. , . , . multiclass SVMs . , .

    :

    , , , , , , .

    , SVMs, , , , SVM, , , SVM .

    , SVM one-against-all, one-against-one error-correcting output codes.

    , : AIC, BIC, , , Laplace, -, GRM, MDL bootstrap. , .

    , SVMs, : RIC, KRIC, SVMICa SVMICb, , , Nystrom KRIC

  • 12

    .

    , SVMs , SVMs, , multiclass SVMs, .

    , (Confusion Matrix). , .

    , .

  • 13

    Abstract

    Progress in digital data acquisition and storage technology have resulted in an increase of massive databases. The analysis of these databases to find unsuspecting relationships and to summarize the data in novel ways that are understandable and useful, is Data Mining. The main objectives of the DM is the prediction and description, one of the ways to achieve them is the classification, i.e. learning a function that classifies a data item into one of the several predefined classes. In this thesis, we theoretically analyze the Support vector machines of two or more classes, which form a new classification method. The primary goal, however, of DM is the extraction of new information from the data, which can be achieved by various techniques which are distinguished in the supervised and unsupervised methods. The first category consists of the stages of model selection and assessment, for these two stages has been developed a wide variety of methods and criteria. Objective and main theme of this thesis is the presentation of these criteria and methods and the introduction of new information Criteria for binary and multiclass SVMs and the experimental comparison of their performance. Finally, the experimental application of the methods of support vector classification problems in two and three classes is evaluated.

    More specifically:

    The first chapter is an introduction to the basic concepts of Data Mining, for instance: its procedure, the data components, its methods, the components of its algorithms, its categories and types of learning and its applications.

    In the second chapter, the theoretical background of binary SVMs is analyzed, and more specifically, the optimal separating hyperplane, the support vector classifier, the binary SVM, the kernels, the kernel trick, and finally the SVM as a penalization method.

    In the third chapter, the binary SVM is applied to classification problems with more than two classes through the methods of one-against-all, one-against-one and the error-correcting output codes.

    In the fourth chapter, a wide variety of criteria and methods used for the model selection and assessment, is analyzed, such as: AIC, BIC, cross validation, cross-validated error rate, evidence calculated by Laplaces method, -estimation, GRM, MDL and bootstrap. Moreover, their theoretical background is analyzed and the corresponding experimental results for their performance are presented.

    In the fifth chapter the new information criteria for binary SVMs are presented, namely: RIC, KRIC, SVMICa and SVMICb, their theoretical background, some prerequisite concepts, the Nystrom approximation method for the calculation of the KRIC and the experimental comparison of their performance to the corresponding performance of the criteria of the fourth chapter through simulation patterns and tests on real data.

  • 14

    In the sixth chapter, two new information criteria for SVMs of more than two classes, which use the radius-margin bound for binary SVMs, some prerequisite concepts, an experimental comparison of their efficiency for model selection in multiclass SVMs, a comparison of the computation load to that of other methods and finally some experimental results on the benchmark datasets are presented.

    In the Seventh Chapter, Statistical hypothesis testing and performance criteria of the model (Confusion Matrix) are analyzed. Moreover, the application of support vector methods to problems of two and three classes are experimentally evaluated.

    Finally, in the eighth chapter a summary of the thesis is presented and some general conclusions are exported.

  • 15

    , . . , , .

    , , .

    , , , .

    , , , , , .

    ,

    , 2015

  • 16

  • 17

    1

    1.1 DATA MINING , ( , , ) ( , ). , . O , . , : : ( ) . : . ,, , , . . , (.. up-to-date ). , . . .

    , . ,

  • 18

    , . , . , ,

    , . , . , . , , . , . , , . , . . . . , . , , , . Knowledge Discovery Database (KDD). (AI). KDD : , , ( ), . . , . : 1. . 2. ( ) 3. . 4. .

  • 19

    1.2 :

    (I)

    (ii),

    (iii)

    (iv)

    :

    (a)

    (b)

    (c)

    (v)

    (vi)

    (vii) - -

    (viii)

    1.3

    1.3.1 ,

    :

    (i) (Concept): .

    : concept.

    (ii) (Instance): concept.

    : .

    (iii) (attribute): . (attributes), . .

    1.3.2 (levels of measurement) :

    () (Nominal)

  • 20

    , . ( , ) ,

    () (Ordinal)

    , . , .

    () (interval)

    , . , .

    () (ratio)

    , . ( ).

    1.4 (prediction) (description). . , . , , . , .

    (classification) .

    (regression) .

    (clustering) , . , .

    (summarization) . , , .

    (dependency modeling) . :

  • 21

    (1) .

    (2) , .

    (change and deviation detection) .

    1.5 . :

    (1)

    (2)

    (3)'

    , . . , . . , . invisible .

    ( ) KDD. . , , .

    :

    (1)

    (2)

    , , .

    , .

  • 22

    : .

    1.6 DATA MINING-

    1.6.1 DATA MINING () (direct) Data Mining

    () (indirect) Data Mining

    1.6.2 DATA MINING Data Mining . :

    (1) (supervised)

    (2) (unsupervised)

    (1)

    . . :

    (I) ( )

    f(x) . , , , .

    (ii) (- )

    - f(x) , . : , slice inverse , splines .

    (iii) ( )

    : Kernel, , , splines .

    (iv)

  • 23

    f(x) f(x) . : SVMs ( ), , .

    (2)

    :

    :

    .

    :

    R2, Cp Mallow's, AIC BIC, MDL .

    () (unsupervised methods)

    . . : kohonen networks, two step, k means, , , .

    1.7 DATA MINING Data Mining (DM) . ( ) ( ). DM . D.M ( DM ), ( ), ( ). D.M . DM , . DM ( , ), (

  • 24

    ) ( )

    D.M :

    (1)

    I)

    ii)

    iii)

    1: .

    2: , , .

    (2)

    (I)Target marketing

    (ii) Customer relation manager

    (iii) Market basket analysis (super market)

    (iv) Cross selling

    1:

    2: Diapers and Beer.

    H , . .

    3: .

    (3)

    DM :

    (I) (news group, email, documents Web Analysis)

    (ii)

    1: () .

    2: .

  • 25

    2

    (SVMS)

    2.1

    (SVMs) Vapnik AT&T Bell Labs. SVMs . . SVMs . SVMs . SVM , : .

    2.2

    . , , . :

  • 26

    (2.2.1)

    (2.2.2)

    . :

    (2.2.3)

    :

    (2.2.4)

    ,

    ,

    .

    (2.2.1) :

    (2.2.5)

    .

    , . ( ). Langrange () :

    (2.2.6)

    , :

    = (2.2.7)

    0= (2.2.8)

    (2.2.6) Wolfe:

    =

    (2.2.9)

    .

    orthant, , , . Karush-Kuhn-Tucker, (2.2.7), (2.2.8), (2.2.9)

  • 27

    (2.2.10)

    :

    , ,

    ;

    , .

    (2.2.7) - . (2.2.10) .

    :

    (2.2.11)

    ( ), . . . , , . , . , .

    2.3

    2.3.1: . . ,

  • 28

    2= . ().

    ,

    ;

    ,

    ,

    . :

    (2.3.1)

    : :

    (2.3.2)

    To (2.3.1) x . , 1 -1 ( 2.2.1).

    (2.3.3)

    . H 2 . .

    :

    (2.3.4)

    , . = . (2.3.4) . ( , ).

    A . , . (slack) . (2.3.3) .

    (2.3.5)

    (2.3.6)

  • 29

    .

    , . , . , - , . (2.3.6) , .

    E .

    . , . , .

    , , = (2.3.4) :

    (2.3.7)

    .

    1 ,

    (2.3.6). 2.3.1 . (2.3.7), .

    2.4

    To (2.3.7) , . Lagrange. (2.3.7) :

    (2.4.1)

    C (2.3.7). C= .

    H Lagrange ()

    (2.4.2)

  • 30

    , . , :

    = , (2.4.3)

    0= (2.4.4)

    (2.4.5)

    . (2.4.3)-(2.4.5) (2.4.2), (Wolfe) :

    (2.4.6)

    (2.4.1)

    . .

    (2.4.3)-(2.4.5), Karush-Kuhn-Tucker :

    (2.4.7)

    (2.4.8)

    (2.4.9)

    . (2.4.7)-(2.4.9) .

    (2.4.3) :

    (2.4.10)

    o i (2.4.9) ( (2.4.7)).

    (support vectors-SVs), . ,

    ( ) (2.4.8) (2.4.5)

    ( SVs). ( ) ( SVs). (2.4.7)

    .

    (2.4.6) (2.4.2) .

    , :

    . (2.4.11)

    C.

  • 31

    2.5 SVM (KERNEL FUNCTIONS)

    (2.4.2) . . h, .

    Lagrange (2.4.6) :

    (2.5.1)

    (2.4.3) :

    (2.5.2)

    , , (2.5.2)

    (2.5.1) (2.5.2) h(x) . , h(x) , (kernel function)

    , (2.5.3)

    . (-) .

    Mercer: . : ,

    (2.5.4)

    ,

    (2.5.5)

    . ( ) ,

    ,

    (2.5.6)

  • 32

    Kernel Matrix: :

    (1,1) (1,2) (1,3) ... (1,m)

    (2,1) (2,2) (2,3) (2,m)

    ... ... ... ... ...

    K(m,1) K(m,2) K(m,3) ... K(m,m)

    .

    1. : ( GRBF )

    (2.5.7)

    >0, .

    2. : GRBF

    (2.5.8)

    i-

    3. :

    (2.5.9)

    4. : (o )

    (2.5.10) b>0 . , ,c,d b, , . Mercer. Mercer b c. (2.4.2)

    (2.5.11)

    C , . C ;

  • 33

    C , . , SVMs . SVMs

    (2.5.12)

    (2.5.13)

    2.6 (KERNEL TRICK)

    (2.4.10) (2.4.11) :

    (2.6.1)

    (2.6.1) :

    (2.6.2)

    (kernel trick). , , (mappings) , . . .

    , x, .

    2.7 SVMs

    , :

    (2.7.1)

  • 34

    + . , . (2.7.1), (2.4.1).

    hinge , . (2.7.1) . () SVM , , . , . hinge

    , . . uberized hinge , yf=-1.

    2.7.1: (hinge loss), (binomial deviance) , Huberized hinge . yf f, y=+1 y=-1. H Huberized asymptotes SVM , . -1.

    . . 2.7.2 . hinge G(x),

  • 35

    . Huberized ( , ), SVM hinge ( ).

    2.7.2: 2.7.1. . SVM hinge , .

    (2.7.1) SVM ,

    , ( ). h(x) ( (roughness)), hj h o .

    2.7.2, .

    (2.7.1) 0 .

  • 36

  • 37

    3

    (MULTI-CLASS SVMS)

    3.1

    . . , , .. 0 9. SVMs . , .

    3.2 ONE-AGAINST-ALL MULTI-CLASS SVM

    SVM , . .

    4 SVMs :

    1 2,3,4

    2 1,3,4

    3 1,2,4

    4 1,2,3

    3.2.1

    x, i- , :

    (3.2.1)

  • 38

    H 4

    . , :

    (3.2.3)

    3.3 ONE-AGAINST-ONE MULTICLASS SVM

    SVMs . :

    1 2

    1 3

    1 4

    2 3

    2 4

    3 4

    3.3.1

    x, 6 . i j x i, i , :

    1 2 1

    1 3 1

    1 4 1

    2 3 2

    2 4 4

    3 4 3

    3.3.2

    , :

    1 2 3 4

    # 3 1 1 1

    3.3.3

    , x . k , SVMs .

  • 39

    , , , .

    3.4 MULTI-CLASS SVM

    one-against-one , k , . , .

    , . (3.4.1). one-against-one .

    One-against-all One-against-one

    SVM

    # SVMs

    3.4.1

    , , k SVMs k , .

    3.5

    SVM . : error-correcting-output-codes multiobjective . one-against-all . SVMs , multiobjective . , , .

  • 40

    3.6 ERROR-CORRECTING OUTPUT CODES

    ECOC Dietterich & Bakiri 1995. n, n bit n x, S n bit S.

    3.6.1:

    :

    3.6.2:

    3.6.3: 15-bit 10

    Hamming: Hamming .

  • 41

    error-correcting , Hamming, . Hamming d bit.

    ECOC :

    1. : , Hamming .

    2. : bit- .

    3.6.4: 3 .

    k , .

  • 42

  • 43

    4

    4.1

    .

    4.2

    ,

    .

    .

    :

    (4.2.1)

    ,

    , (4.2.2)

    (). . :

  • 44

    (4.2.3)

    , . .

    4.2.1 : 100 , 50, . Err ( ).

    :

    (4.2.4)

    , . . .

    . . :

    1) :

  • 45

    2) : , - .

    , : , , . . 50% 25% .

    4.3 -

    , () ,

    X=x0 , :

    (4.3.1)

    .

    . , () .

    k--, :

    (4.3.2)

    xi . k

    . k

  • 46

    . k, .

    ,

    p , :

    , -

    . ,

    .

    (4.3.3)

    . p.

    4.4

    :

    (4.4.1)

    , . . . . :

    (4.4.2)

    O .

    (4.4.3)

    , , .

    0-1 :

    (4.4.4)

  • 47

    Cov . , . , , . :

    (4.4.5)

    , d . :

    (4.4.6)

    ,

    (4.4.7)

    d , . .

    . , . ( ) .

    4.5 - H AKAIKE

    H - :

    (4.5.1)

    Akaike , ( ). (4.4.7) ,

    (4.5.2)

    (

    ), :

  • 48

    (4.5.3)

    AIC , AIC . d .

    , d() . :

    (4.5.4)

    . . (4.4.6) . p d

  • 49

    (4.6.2)

    S .

    4.7 BIC

    (BIC), AIC, . BIC :

    (4.7.1)

    BIC ( ) Schwarz.

    , ,

    ( ) ,

    .

    :

    (4.7.2)

    BIC AIC, 2 . N> , BIC , , . AIC,

    . BIC, . AIC, BIC . , .

    , . , :

    (4.7.3)

    .

    odds:

  • 50

    (4.7.4)

    odds , m, l. H

    (4.7.5)

    Bayes, odds.

    . Laplace , :

    (4.7.6)

    . :

    BIC (4.7.1). , BIC, ( ) . BIC , :

    (4.7.7)

    . C BIC. BIC . , , BIC . , , BIC .

    4.8

    4.8.1

    (cross- validation) : . , . k-

  • 51

    (k-fold cross- validation). k- k- .

    k- k . k k-1 .

    : k-1 . . , k . , , .

    :

    (i) . () .

    (ii) , .

    , , . N , , N , , . , , , .

    4.8.2 HOLD-OUT

    , . : . . hold-out , . -.

  • 52

    . , . hold-out . , k- .

    4.8.3 -

    k- (k-fold cross- validation) k ( ) . k , k-1 . k . . , 50% , , .

    4.8.1 :

    4.8.4 10-

    . . : (i) , (ii) . .

  • 53

    , . . , . , , 30+ .

    , 30 -. , , . , , . , , , , . , . . .

    : . , . . k- .

    k. k k (i) (ii) . , . k , , . , 5- , 4 , 10- , 9 . , k , . , 10 , 10%, 20 5%. k=10

  • 54

    . k 90% , .

    4.8.5 LEAVE-ONE-OUT (LOOCV)

    leave-one-out k- , k . , . LOOCV , , . , , .

    4.9 (CROSS VALIDATED ERROR RATE)

    4.9.1

    (. 4.9 ) k :

    4.9.1

    k=1,..,K, k-1 ,

    k- :

    (4.9.1.1)

    :

    (4.9.1.2)

    ( k=5 k=10).

  • 55

    4.9.2

    . . , , , . CV1 . , , 100 1000 , (0,1). . , 50% . CV1 0,025, , . , . , , . CV2.

    4.9.3 SVM

    SVM ;

    (4.9.3.1)

    i=1,,n , . , ,

    b w . i b . SVs, i f(x) SVs . H SVs .

    . , :

    (4.9.3.2)

    n1 n2 1 2. S, . S,

  • 56

    (4.9.3.1) j

    , :

    (4.9.3.3)

    d j-

    w. S , j S

    (4.9.3.4)

    To j.

    , SVM-RFE .

    SVMs , :

    ,

    (4.9.3.5)

    O :

    =

    (4.9.3.6)

    :

    (4.9.3.7)

    , w .

    outliers.

  • 57

    4.9.3 : R-SVM

    4.9.4

    , , (4.9.3.4)

    . :

    0 : . ( ).

    1 : i, SVM .

  • 58

    2 : ,

    SVM .

    3: i=i+1. 1 i=k

    CV2 . , ( ). . , , . , ( ).

    , CV2 U ( ). CV2 .

    4.10 GACV ( )

    GACV Kullback-Leibler (GCKL) f, .

    4.10.1 KULLBACK-LEIBLER

    ullback-Leibler , . K-L , . K-L P Q :

    (4.10.1.1)

    K-L P Q bits P Q. K-L P Q. .

    4.10.2 KULLBACK-LEIBLER (CKL DISTANCE)

  • 59

    Bernoulli,

    , :

    (4.10.2.1)

    f , CKL :

    (4.10.2.2)

    . CKL

    . , . , CKL , proxy .

    4.10.3 KULLBACK-LEIBLER (GCKL DISTANCE)

    , t , . y tj. , Kullback-Leibler (GCKL), g :

    (4.3.10.1)

    = 0

  • 60

    (4.10.4.1)

    f .

    leaving-out-one :

    , (4.10.4.2)

    GCKL().

    (4.10.4.3)

    =

    (4.10.4.4)

    (4.10.4.5)

    (4.10.4.6)

    (4.10.4.7)

    (4.10.4.8)

    T .

    , leaving-out-one .

    4.10.5 LEAVING-OUT-ONE

    (4.10.4.1),

    , i- ,

    .

  • 61

    4.10.6

    (4.10.6.1)

    =

    (4.10.6.2)

    Taylor Y

    Leaving-Out-One,

    , :

    (4.10.6.3)

    (4.10.6.4)

    (4.10.6.5)

    (4.10.4.2) ACV() (4.10.4.1) :

    (4.10.6.5)

    4.10.7 GACV

    (4.10.4.5)

    (4.10.7.1)

    (4.10.7.2)

    ,

    ,

    .

    . Y . Taylor :

    (4.10.7.3)

  • 62

    (4.10.7.4)

    (4.10.7.5)

    H (4.10.7.5) (4.10.7.2) D() (4.10.7.1). GACV :

    (4.10.7.6)

    (4.10.7.6) GCKL().

    4.10.1 :

    4.10.8 ranGACV

    , leaving-out-one

    ( ,

    ). ACV ( ) :

    D :

  • 63

    =

    proxy CKL.

    4.11 LAPLACE

    4.11.1

    , . , . , , . , . . , .

    , , . . SVMs M .

    SVMs, Smola. SVMs ,

    . >0

    .

    4.11.2 SVM

    Mckay . .

    , . w , k- f, , D :

  • 64

    .

    . , w. , :

    , , . SVM, 1 ( =# ). SVMs , , .

    4.11.3 , SVMs, . n 100. :

    #

    #

    1 500 104

    2 210 2100

    3 150 299

    . , . . . SVM SVM .

    1

    ,

  • 65

    . , . d . C ( d ) . , d . .

    N

    4.11.1: C ( )

  • 66

    4.11.2: C ( )

    4.11.3: 1 2 ( )

  • 67

    4.11.4: 1 2 ( )

    4.12 -

    4.12.1

    SVMs. . (-estimators) (.. ) Vapnik (1998) Jaakola Haussler (1999) SVMs .

    4.12.2

    . :

    n, , . . L (learner), , . , (precision)2 (recall)3 . , Rec(h), h, y=1 . , Prec(h), h, =1, .

    2 Precision=

    3 Recall

  • 68

    . , , .

    4.12.3 SVMS

    - leave-one-out . leave-one-out .

    . ,

    ,

    , , leave-one-out . . learner n n, leave-one-out .

    - leave-one-out , , , learner . SVM

    SVM .

    4.12.4

    -

    :

    1: SVMs , - :

    ,

    (4.12.4.1)

    =2, , .

    2 ( =1 ), d

    .

    SVM

    , (4.12.4.1) SVM

    1: leave-one-out

    SVMs

    , , :

    (4.12.4.2)

    , .

  • 69

    2: SVM c, .

    1: - :

    (4.12.4.3)

    n, n-1, - . , .

    4.12.5 ,

    , , - , .

    2 : SVMs , - , :

    (4.12.5.1)

    (4.12.5.2)

    (4.12.5.3)

    = (4.12.5.4)

    = (4.12.5.5)

    = (4.12.5.5)

    ( ) () (4.12.4.1) () leave-one-out . .

    4.12.6

    - . Reuters-21578 ( ) Mod Apte ( 12.902 ) 10 . -

  • 70

    . 10 . . . C=0,5 =1. - , , , , hold-out =1 =2. - , . =2, - =1. - hold-out . hold-out . , hold-out . . , =1, =2. , , hold-out .

  • 71

    4.12.1: - (=1) , , hold-out 10 Reuters.

  • 72

    4.12.2: - 10 Reuters . =1, =2. hold-out .

  • 73

    10 / .

    4.13 (GUARANTEED RISK MINIMIZATION)

    4.13.1

    . . GRM penalty-based .

    4.13.2

    , :

    1. (h) h f D. :

    n>0:

    c, , 1 n 0 1-n.

    2. To h S. : , m S.

    3.

    .:

    4. To

    , , .

  • 74

    4.13.2: (d) ( ), ( ) ( ) d 100 , 2000 n=0,2. 10

    4.13.3 (STRUCTURAL RISK MINIMIZATION) (TRUE RISK)

    (regularizer), . Vapnik-Chervonenkis guaranteed risk , n, . To (4.13.1)

    4.13.1

  • 75

    (4.13.3.1)

    L , h VC- q . .

    4.13.4 VAPNIK- CHERVONENKIS

    V-C , ( ) Vapnik-Chervonenkis.

    : . To h , . . (shattering). V-C h, . V-C n- , n , . , , h=n+1 , V-C h=n+1.

    4.13.5 (STRUCTURAL RISK MINIMIZATION)

    SRM (4.13.3.1) . i :

    (i) V-C (ii) , (iii)

    . = .

    4.13.6 (EMPIRICAL RISK MINIMIZATION)

    Q ERM V-C . ERM . Q, ERM ( ) , .

  • 76

    4.13.7 GRM

    GRM . (d) d, ,

    (d), . GRM, , :

    (4.13.7.1)

    d V-C . GRM Vapnik arg , :

    (4.13.7.2)

    , GRM (4.13.7.1) .

    4.13.8

    GRM . . , () , d=100 ( d=40 m=500 ) d=100 m. GRM, GRM ( arg (4.13.7.1) ) . GRM 0,5. , m , . GRM , . , m.

  • 77

    4.13.3: , 1.0 ( ) 0.5 ( )

    m

    100 n=0.2. 10 .

    4.13.4: , 1.0 ( ) 0.5 (

    )

    m 100 n=0.2. 10 .

  • 78

    4.13.5: GRM

    d 2000 100 0,2.

    4.14 (UM DESCRIPTION LENGTH)

    (MDL) . . z ( ). , , .

    , : . . , ( ) . . , , , .

    , Shannon, :

    (4.14.1)

    (4.14.1), .

    . , Huffman. , . :

  • 79

    z, , bits . = , .

    . z= . ( ) . :

    (4.14.2)

    (log-probability) -, . , -.

    MDL (4.14.2). (4.14.2) .

    . . , z , , . = , . MDL . .

    4.15 BOOTSTRAP

    bootstrap . bootstrap . bootstrap , Err.

    Z= , . , . , bootstrap . bootstrap . bootstrap

  • 80

    bootstrap ,

    . , , b- bootstrap, :

    (4.15.1)

    , . bootstrap , . .

    , bootstrap . bootstrap . H leave-one-out bootstrap :

    (4.15.2)

    , bootstrap b

    i .

    ,

    0 ,

    (4.15.2) 0.

    - : , . predictors

    . (4.15.3)

    (4.15.4)

    0 , ( ) 1, .

  • 81

    5

    5.1 T

    5.1.1

    (SVMs) (KLR). , KLR SVMs. KLR SVMs , ( C) .

    , C. C SVMs KLR .

    , SVMs. , Sollich. , (hyperparameter) . , (KRIC) .

    KRIC (RIC) . RIC . RIC Kullback-Leibler

  • 82

    . RIC . RIC SVMs KRIC. KRIC (KLR). KLR KRIC. SVMs, KLR SVMs .

    5.1.2 (KLR)

    KLR :

    KLR :

    i=1,,l

    5.1.1 : SVMs KLR

    H KLR

    To 1 . . Wolfe KLR .

    KLR

  • 83

    :

    5.1.3 SVMS KLR . p( ) w . :

    =

    (5.1.3.1)

    n . K . . , :

    , (>0)

    . ayes

    :

    5.1.4 SOLLICH SVMS Sollich. . , . .

    ( 5.1.4.1)

  • 84

    ( 5.1.4.1) .

    0 .

    5.1.5 (RIC)

    RIC

    (5.1.5.1)

    ,

  • 85

    ,

    D

    . I J, po .

    (5.1.5.2)

    KL(p||q) Kullback-Leibler p q. . RIC , Kullback-Leibler (5.1.5.1) .

    5.1.6 (KRIC)

    KLR SVMs , (KRIC). Hilbert . SVMs. ,

    (5.1.6.1)

    . , RIC .

    RIC.

    , . I J RIC .

    (5.1.6.2)

    (5.1.6.3)

    (5.1.6.4)

  • 86

    (5.1.6.5)

    (5.1.6.6.)

    (5.1.6.2) (5.1.6.5) . I J p0 , J I

    (5.1.6.7)

    SVMs , p( | , ,) (5.1.3.1). ,

    :=

    , i =1,,l (5.1.6.8)

    :

    O .

    ,

    ( ti (5.1.6.8) ). ,

    , . { } { } ,

    , .

    (5.1.6.9)

    (5.1.6.2) (5.1.6.5) (5.1.6.9)

    .

  • 87

    , 0 , (5.1.6.10)

    (5.1.6.10)

    , .

    . (5.1.6.10)

    .

    = ).

    .

    . 4,

    . (5.1.6.11)

    (5.1.6.11) ,

    (5.1.6.12)

    (5.1.6.12) (5.1.6.1) , SVM, (KRIC).

    KRIC SVMs

    4

    .

  • 88

    (5.1.6.13)

    ti mi (5.1.6.4) (5.1.6.6). KRIC KLR (5.1.6.13) n=1 (5.1.6.4) (5.1.6.6).

    , KRIC Sollich SVMs. To :

    . , KRIC Sollich .

    p0(x) ,

    (). , KRIC :

    KRIC

    1 2 1

    ti mi (5.1.6.4) (5.1.6.6).

    5.1.7 NYSTROM KRIC

    , KRIC Nystrom. KRIC O(l3) , l l . SMO, SVMs O(l3) . , - O(l3) . , KRIC -

  • 89

    . Nystrom KRIC .

    Nystrom , Gram

    . , m

    1 1 . Gram m

    , .

    n- .

    .

    , :

    (5.1.7.1)

    , (5.1.7.1)

    :

    1 1 2 1 (5.1.7.2)

    KRIC Nystrom. , KRIC Nystrom .

    KRIC SVMs Nystrom:

  • 90

    KRIC Nystrom . , KRIC m- l .

    Data property

    Name Dim # train. # test

    Bld 6 230 115

    Cra 6 133 67

    Hea 13 180 90

    Ion 33 234 117

    Rsy 2 250 1000

    Snr 60 138 70

    5.1.2:

    5.1.8

    KRIC , 10- (10-fold cross validation), Laplace (evidence calculated by Laplaces method (Kwok, 2000)) - (-estimation (Joachims, 2000)) .

    SVMs.

    SVMs KLR t KRIC .

    -

  • 91

    .

    :

    C . 5.1.2 . 5.1.2 . dim ,# train. ,# test , .

    ( ), rsy, 250 1000 . 100 100 .

    j=1,,n L

    .

    5.1.3: KRIC. rsy . KRIC , . , 10-1.

    .

    C . , 10.0 .

    (5.1.4) (5.1.5) SVMs KLR, . KRIC ( ) , GACV, 10- , KRIC ( Sollich ), Laplace -. KRIC Nystrom. Nystrom, m=50. p=30. To ERROR () . To TIME () CPU

  • 92

    .

    SVM 10- OSU-SVM KLR SMO .

    p-value t-tests KRIC Nystrom . p-value CPU. ( ) p-value 0,05 . (5.1.4) (5.1.5) * ( @) p-value ( KRIC ).

    , KRIC. KRIC GACV -. KRIC 10- Laplace. , - GACV KRIC. KRIC Sollich .

    KRIC KLR SVMs . KRIC 10- t-test. 5.1.3 KRIC.

  • 93

  • 94

    5.1.4 : SVMs ( , =10.0 ). KRIC ( ), GACV, 10- , KRIC ( Sollich) Laplace -. Nystrom KRIC .

  • 95

  • 96

    5.1.5 : KLR ( , =10.0 ). KRIC ( ), GACV, 10- , KRIC ( Sollich), Laplace -. KRIC Nystrom. p-value t-test CPU KRIC ( ).

    5.1.9

    KRIC SVMs KLR. , .

    SVMs KLR. SVMs. To RIC , , Hilbert . RIC, RIC . RIC RIC (KRIC).

    KRIC Nystrom. Nystrom Gram . , KRIC Nystrom . KRIC .

    , KRIC GACV -. 10- Laplace. KRIC .

  • 97

    , KRIC . , KRIC.

    KRIC, , . , KRIC. , KRIC (5.1.5.2) ,

    .

    , . KRIC .

    KRIC. 5.1.8 , C . KRIC , KRIC . , .

    , (.. ). ( ) , . . .

    b SVMs. , SVMs . (5.1.2.4) . (5.1.7.1) :

    . KRIC

    (5.1.6.2) (5.1.6.7) (5.1.6.3). . , KRIC . KRIC. KRIC .

  • 98

    5.2

    5.2.1 , . . , , . . , , . .

    SVMs, (curse of dimensionality). . SVM , . , ( ) .

    . , . AKAIKE (AIC) (BIC) ( ) . SVMs, , . KRIC, SVM, . , AIC, BIC. SVMs. .

    , . , SVMs .

    5.2.2 SVMs

  • 99

    , SVM. . , . .

    . (CV error rate) (GRM) , n .

    5.2.3 . . , , .

    ( / (combined backward elimination/forward selection strategy) ) , (the technique of variable ranking). . p , , p . SVM / (SVM-RFE).

    SVM, ,

    j- w. . . , SVMs .

    ,

    j .

    score Fisher

    SVM,

    j. score Fisher

  • 100

    SVM . score Fisher SVM-RFE ,

    .

    SVM-RFE :

    1: S={1,,p}, r=()

    2: S=

    2a: SVM ( )

    2b: SVM .

    2c:

    S=S\{m} r=(m,r).

    r , .

    5.2.4

    . (CV error rate) KRIC, SV, . SVM, . SVM :

    i SVM, S, S {1,,p}.

    ( ) . , .

    ,

    S .

    C(n) C(n)=2, . SVMs (SVMIC) :

  • 101

    SVMIC(S) (5.2.4.1)

    To SVMIC KRIC ( ) SVM.

    , KRIC (

    ) . ,

    =

    ,

    V= . V

    ,

    V. ,

    , - .

    ,

    , :

    . ,

  • 102

    i-

    , 1. , . ,

    SVMIC. SVMIC SVMIC SVMs AKAIKE (AIC) . AIC , (, 2 ). n , , . BIC. AIC, , . BIC , , ( ) . :

    (5.2.4.2)

    , SVMICs (10 SVMs 10- ) KRIC ( ). . , SVMICs ()

    . ,

  • 103

    , SVMICs CV KRIC.

    5.2.5

    M=100 .

    , .

    , ,

    .

    ,

    p,

    .

    =0 , , >0.

    +

    Z . , =1 , . , SVM . . , score Fisher . (nested models) , SVMICa SVMICb. (CV error rate), 10- GRM Vapnik. CV2 . CV2 , 10- . KRICs. . , (CV error rate) (GRM criteria) , SVM : i ) :

    ii) :

    =

    (

    )

  • 104

    C SVM , C=1, . , , 1000 . . (1) (2) . . scores Fisher , .

    (CV GRM) . KRICs SVMICs . KRICs (n=25), SVMICs . KRICs SVMICa SVMICb. , SVMICa SVMICb . . SVMICa SVMICb ( ) , KRICs . n , 15,9 . , 3 . , 3 (C ). (U, ) ( ,). SVMICa SVMICb, . SVMICa AIC . SVIMCb BIC, . CV error rate GRM , . , . KRICs SVMICa, KRICs SVMIC. , , . . , +1, (1,1) -1, (2,2) . , ( ,p) +1 (-2, 4p), -1.

  • 105

    4, 5 6. , . CV error rate GRM , . SVMICs KRICs . , SVMICs KRICs ( n ) . KRICs (n=25). KRICs SVMICa . 6 . , , , SVM. , , SVM. , , SVMICs KRICs, . . . ,

    ij , i=1,,n,kj ,

    1 4 ,

    ,

    4 . . =0,8 .

  • 106

    5.2.1 :

    . SVMs . p , n. , 6 , SVMICa , SVMICb , CV error rate, GRM KRICs Sollich SVMs.

  • 107

    5.2.2: 1 , scores Fisher Sj.

  • 108

    5.2.3 : SVMICa , SVMICb , CV error rate, GRM, KRIC KRICs .

    . C (), , . U () , , , () . R, 3 .

  • 109

    5.2.4 : , ,

    . SVMs . p , n. , 6 , SVMICa , SVMICb , CV error rate, GRM KRICs Sollich SVMs.

  • 110

    5.2.5 : 5.2.4, , scores Fisher Sj.

  • 111

    5.2.6 : SVMICa , SVMICb , CV error rate, GRM, KRIC KRICs .

    . C (), , . U () , , , () . R, 3 .

    5.2.6

    , .

    : Pina Indians Diabetes (768 , 8 ), Statlog Cleveland Heart Disease (303 , 14 ) , Leo Breiman riingnorm twonorm ( 7400 20 ). 100

  • 112

    ,

    , n . . , SVM KRIC, .

    , . , p . , .

    5.2.7 . KRICs two norm heart .

    ringnorm diabetes KRICs SVMICs . CV error rate GRM , . , SVMICa SVMICb, . , SVMs KRIC, .

  • 113

    5.2.7: . . , : SVMICa , SVMICb , CV error rate, GRM KRICs Sollich SVMs.

  • 114

  • 115

    6

    MULTICLASS SVMS

    6.1

    SVMs multiclass . , . . multiclass SVMs . , multiclass SVMs (exhaustive grid-based search method). k- leave-one-out .

    SVMs multiclass . SVMs, . (radius-margin bound) SVM. , SVMs : 1) multiclass SVMs ( SVMs ) 2) , . .

  • 116

    6.2

    6.2.1 (SINGLE-MACHINE APPROACH)

    . .

    1. Weston, Watkins and Vapnik

    (6.2.1.1)

    : + (6.2.1.2)

    i . j :

    2. Lee, Lin and Wahba

    (6.2.1.3)

    x (6.2.1.4)

    3. Crammer and Singer

    (6.2.1.5)

    (6.2.1.6)

    (1) , , N-1 (1). , ,

    6.2.2 (GRID-BASED SEARCH METHODS)

    , . ,

  • 117

    . . Lipschitz - D , , . , , . , n- .

    6.2.3 (GRADIENT-BASED OPTIMIZATION TECHNIQUES)

    :

    ,

    ,

    .

    6.2.4 BROYDEN-FLETCHER-GOLDFARB-SHANNO (BFGS) QUASI-NEWTON

    H BFGS

    (6.2.4.1)

    H :

    (6.2.4.2)

    f. .

    BFGS quasi-Newton , (n2) , , -.

  • 118

    6.2.1: BFGS quasi-Newton

    6.3 SVMS.

    leave-one-out D. . SVM , :

    (6.3.1)

    R l F, , w . , L2-norm SVM .

    ,

    . , :

    : (6.3.2)

    i- Lagrange

    .

    , SVM.

    : (6.3.3)

  • 119

    i- Lagrange. , t-

    (6.3.4)

    (6.3.2).

    (6.3.5)

    (6.3.3).

    ,

    (6.3.6)

    :

    1. 2. , (6.3.3) (6.3.2) ,

    .

    3.

    , (2.3.6) . , . 4. . , 2.

    6.4

    D , . , multiclass SVM . c-:

    (6.4.1)

    To , i j, :

    (6.4.2)

    . x ,

    , , multiclass SVM .

  • 120

    , . ,

    .

    one-versus-all max wins , multiclass SVM . , , SVM . SVMij SVM i j.

    i j SVMij

    i j. , SVMij i j +1 -1 .

    :

    (6.4.3)

    SVM ,

    SVMij. SVM :

    (6.4.4)

    To .

    max wins, x :

    (6.4.5)

    x i. To sign() , +1 >0 , 0 =0 -1 . To v (c-1) score, , i. To x score. :

    1) , SVM SVMij x i.

    2) , .

    1), SVM x i.

    3) , . SVMij x i j .

    multiclass SVM . ,

    j. , (6.4.2).

    (6.4.5) , score, ,

    . 2) 3), ,

    SVM

    ; k

  • 121

    j . ,

    multiclass SVM, SVM .

    i j ( k )

    (6.4.6)

    . ,

    .

    , .

    .

    leave-one-out SVMij , ,

    (6.4.7)

    , , :

    (6.4.8)

    , .

    ,

    multiclass SVMs. SVMs. To :

    (6.4.9)

    t- :

    (6.4.10)

    (6.4.4) (6.4.5).

    , .

    , multiclass SVMs,

    (6.4.11)

  • 122

    c, . R . (6.4.9) ,

    i j,

    .

    (6.4.9)

    .

    6.5

    (scatter-matrix-based measure) , multiclass . :

    (6.5.1)

    C , i, . m , . , . , , .

    multiclass . , F . F , . ,

    F . , i- . D (i=1,,c),

    . O

    . To

    . :

    (6.5.2)

    (6.5.3)

    (6.5.4)

  • 123

    , F

    .

    ,

    .

    , . H

    :

    (6.5.5)

    . (6.3.3),

    ,

    , ,

    , . , , . (6.5.5) , , . ,

    ,

    . ,

    , ,

    . ( )

    ( ) , , .

    :

    (6.5.6)

    ,

    . , (6.3.2), , , . ,

    , , . ,

    (6.5.6),

    . ,

    .

    ,

    ,

    . , multiclass , . , , R .

    multiclass ,

    F.

    ,

    c :

  • 124

    (6.5.7)

    , c :

    (6.5.8)

    , :

    (6.5.9)

    , SVM

    i j, , I . SVM , o , . . , . , ,

    (6.5.10)

    t-

    (6.5.11)

    (6.5.12)

    , . , .

    , . , , . , .

  • 125

    . , . , .

    6.6

    . , Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton . , . QP(n) n . s . || , k . , e . (6.6.1). (6.6.1), multiclass SVM one-versus-all , . one-versus-one ,

    . O

    (

    )

    , .

    , s, || k. || . , , , . , , || .