Nhom02 Chuong05 SVM BaoCao

  • View
    42

  • Download
    0

Embed Size (px)

Transcript

  • 1

    TRNG I HC KHOA HC T NHIN

    CAO HC CNG NGH THNG TIN KHA 22

    MY HC

    BO CO:

    SUPPORT VECTOR MACHINE GVHD:

    TS. Trn Thi Sn

    HVTH:

    12 11 027 Nguyn Thanh Hng

    12 11 011 Bi Th Danh

    12 11 075 V Quang Trng

    12 11 069 Bnh Tr Thnh

    12 11 024 Phm Minh Hong

    - TP. HCM 2/2013 -

  • 2

    Mc lc

    1 Gii thiu ............................................................................................................... 3

    2 Support Vector Classifier - SVC ............................................................................ 3

    2.1 Phn lp nh phn vi SVC ............................................................................. 3

    2.2 Vn d liu khng phn tch tuyn tnh ..................................................... 6

    2.2.1 Soft margin ............................................................................................... 7

    2.2.2 Th thut Kernel ....................................................................................... 9

    2.3 Cc phng php hun luyn SVC ............................................................... 13

    2.3.1 Phn on (Chunking) ............................................................................ 13

    2.3.2 Phng php ca Osuna ......................................................................... 14

    2.3.3 SMO Sequential minimal optimization ............................................... 14

    2.4 Cc hng pht trin ..................................................................................... 15

    2.4.1 Hiu qu tnh ton .................................................................................. 15

    2.4.2 La chn kernel ...................................................................................... 15

    2.4.3 Phn tch tng qut ................................................................................. 16

    2.4.4 Hc SVM c cu trc ............................................................................. 17

    3 Support Vector Regressor SVR ........................................................................ 18

    3.1 Gii thiu bi ton hi quy ............................................................................ 18

    3.2 Hi quy vi SVR ........................................................................................... 21

    3.3 Support Vector Regression ..................................................................... 27

    4 Ph lc .................................................................................................................. 30

    5 Ti liu tham kho................................................................................................ 30

  • 3

    1 GII THIU

    Support Vector Machine (SVM) l phng php mnh v chnh xc nht trong s cc

    thut ton ni bt lnh vc khai thc d liu. SVM bao gm hai ni dung chnh l:

    support vector classifier (SVC), b phn lp da theo vector h tr, v support vector

    regressor (SVR), b hi quy da theo vector h tr. c pht trin u tin bi Vapnik

    vo nhng nm 1990, SVM c nn tng l thuyt c xy dng trn nn mng l thuyt

    xc sut thng k. N yu cu s lng mu hun luyn khng nhiu v thng khng

    nhy cm vi s chiu ca d liu. Trong nhng thp nin qua, SVM pht trin nhanh

    chng c v l thuyt ln thc nghim.

    Trong cc phn tip theo sau y, nhm s trnh by chi tit v Support Vector

    Classifier v Support Vector Regressor.

    2 SUPPORT VECTOR CLASSIFIER - SVC

    2.1 PHN LP NH PHN VI SVC

    Xt mt v d ca bi ton phn lp nh hnh v, ta phi tm mt ng thng

    sao cho bn tri n ton l cc im , bn phi n ton l cc im xanh. Bi ton m

    dng ng thng phn chia ny c gi l phn lp tuyn tnh (linear

    classification).

    Hnh 1: Minh ha phn lp tuyn tnh

    Hm tuyn tnh phn bit hai lp nh sau:

    ( ) (1)

    Trong :

    l vector trng s hay vector chun ca siu phng phn cch, T l k hiu chuyn v.

    l lch

    Lu rng, nu khng gian l 2 chiu th ng phn cch l ng thng, nhng

    trong khng gian a chiu th gi l siu phng.

  • 4

    Tp d liu u vo gm N mu input vector {x1, x2,...,xn}, vi cc gi tr nhn

    tng ng l {t1,,tn} trong * +. Gi s tp d liu c th phn tch tuyn tnh hon ton, ngha l cc mu u c phn ng lp bi ng phn cch. Khi ,

    gi tr tham s w v b theo (1) lun tn ti v tha ( ) cho nhng im c nhn v ( ) cho nhng im c , v th m ( ) cho mi im d liu hun luyn.

    tm ng phn cch, SVC thng qua khi nim gi l l, ng bin (margin).

    L l khong cch nh nht gia im d liu gn nht n mt im bt k trn ng

    phn cch, xem hnh HNH 2.

    Hnh 2: Minh ha margin (l)

    Theo SVC, ng phn cch tt nht l ng c margin ln nht. iu ny c ngha

    l tn ti rt nhiu ng phn cch xoay theo cc phng khc nhau, v khi phng

    php s chn ra ng phn cch m c margin ln nht.

    Hnh 3: Minh ha ng phn cch ti u

    Khong cch t im d liu n ng phn cch nh sau:

    | ( )|

    (2)

    Khng mt tnh tng qut, Vapnik xp x bi ton thnh:

  • 5

    {

    (3)

    Cc im d liu lm cho du = xy ra trong biu thc trn c gi l cc vector

    h tr (support vector). Chng cng chnh l cc im d liu gn ng phn cch ti

    u nht. Theo , khong cch t cc support vector n mt phn cch ti u s l:

    ( )

    {

    (4)

    Khi , l phn cch gia hai lp l

    (5)

    tm c ng phn cch ti u, SVC c gng cc i theo w v b:

    ( )

    (6)

    iu ny tng ng vi:

    ( )

    (7)

    y c xem l bi ton c s (primal problem). gii quyt bi ton ny, ngi

    ta dng phng php nhn t Lagrange (Lagrange multiplier). Hm Lagrange tng ng

    cho (7) l:

    ( )

    , (

    ) -

    (8)

    Ly o hm L theo hai bin w v b, ta c

    {

    ( )

    ( )

    (9)

    Suy ra:

  • 6

    {

    (10)

    Th vo hm Lagrange, thu c:

    ( )

    (11)

    iu kin b sung Karush-Kuhn-Tucker (KKT) l:

    , ( ) - (12)

    Theo , ch nhng support vector (xi, yi) mi c i tng ng khc khng, nhng

    im d liu cn li c i bng 0. Support vector chnh l ci m ta quan tm trong qu

    trnh hun luyn ca SVM. Vic phn lp cho mt im d liu mi s ch ph thuc vo

    cc support vector.

    Bi ton kp (dual problem) (11) l lp bi ton ti u quy hoch bc 2 li (convex

    quadratic programming optimization) tiu biu. Trong nhiu trng hp, n c th t ti

    u ton cc khi p dng cc thut ton ti u ph hp, v d SMO (sequential minimal

    optimization). Chi tit SMO s c trnh by phn sau.

    Sau khi tm c cc nhn t Lagrange ti u i th chng ta c th tnh w v b ti u

    theo cng thc bn di. Lu vi b th ch cn ly mt vector h tr dng (tc t = +1)

    l c, nhng m bo tnh n nh ca b, chng ta c th tnh bng cch ly gi tr

    trung bnh da trn cc support vector.

    (13)

    2.2 VN D LIU KHNG PHN TCH TUYN TNH

    Vic yu cu d liu phi phn tch tuyn tnh hon ton l nghim ngt v khng

    ph hp vi cc bi ton thc t, c bit l cc trng hp phn lp phi tuyn phc tp.

    Trong khi , cc mu khng phn tch tuyn tnh hon ton dn n vic khng th gii

    quyt cc bi ton ti u tm w v b tng ng. gii quyt vn ny, c hai cch

    tip cn chnh:

  • 7

    Soft-margin Th thut Kernel.

    2.2.1 SOFT MARGIN

    V nhiu l do, do bn cht hoc do sai st trong qu trnh thu thp d liu, tn ti mt

    s im thuc lp ny ln ln vo lp kia, iu ny s lm ph v s phn tch tuyn

    tnh. Nu ta c tnh phn tch hon ton s lm cho m hnh d on qu khp. chng

    li s qu khp, ngi ta m rng SVC n chp nhn mt vi im phn lp sai. K

    thut ny gi l soft margin.

    Hnh 4: Minh ha trng hp d liu nhiu;

    lm iu ny, mt bin (gi l slack variable) i c thm vo biu thc cn ti

    u nhm cho php m hnh phn lp thc hin phn lp sai mc chp nhn c:

    ( )

    (14)

    Tham s C dng cn bng gia phc tp tnh ton v s lng im khng th

    phn tch. N c gi l tham s chun ha (regularization parameter). Gi tr C c

    th c lng nh thc nghim hoc phn tch d liu.

    Cc bin i c thm cho tng im d liu, cho bit s sai lch khi phn lp vi

    thc t. C th:

    cho nhng im nm trn l hoc pha trong ca l.

    ( ) cho nhng im cn li.

  • 8

    Theo , nhng im nm trn ng phn cch ( ) s c v nhng im phn lp sai s c . Theo Lagrange ta vit li:

    ( )

    * ( ) +

    (15)

    Trong * + v * + l cc nhn t Lagrange.

    Cc iu kin KKT cn tha l:

    ( )

    ( ( ) )

    vi i = 1,,n

    Ly o hm (15) theo w, b v { }:

    Th tt c vo (15) ta c:

    ( )

    (16)

    Suy ra cng thc cho bi ton kp vi soft margin nh sau:

  • 9

    ( )

    (17)

    iu kin KKT tng ng l:

    , ( ) -

    (18)

    Nh trc , tp cc im c khng c ng gp g cho vic d on im d liu mi. Nhng im cn li to thnh cc support vector. Nhng im c v theo (18) tha:

    ( )

    (19)

    Nu th , suy ra . l nhng im nm trn l.

    Nhng im c th , c th l nhng im phn lp ng nm gia l v ng phn cch nu hoc c th l phn lp sai nu

    xc nh tham s b, chng ta dng nhng support vector m , tng ng vi . Ln na, m bo tnh n nh ca b ta nn tnh theo trung bnh.

    2.2.2 TH THUT KERNEL

    Theo nh l Cover v s phn tch mu, mt bi ton phn lp mu phc tp m

    chuyn sang khng gian c s chiu cao bng php chuyn i phi tuyn th c kh nng

    phn tch tuyn tnh cao hn khi chuyn sang khng gian c s chiu thp. Nh vy, ta

    c th gii quyt vn khng phn tch tuyn tnh bng cch thc hin php chuyn i

    phi tuyn d liu u vo sang khng gian c s chiu cao hn (thm ch l v cng).

    Tuy nhin, s