Bao Cao Tot Nghiep KPDL

  • Upload
    ngaovl

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    1/67

    NHN XT CA GING VIN HNG DN................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    ................................................................................................................................

    .............................................................................................................................................

    .............................................................................................................................................

    .............................................................................................................................................

    .............................................................................................................................................

    .............................................................................................................................................

    1

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    2/67

    LI NI UNgy nay cc lnh vc khoa hc k thut ang ngy mt pht trin mnh m.

    c bit l nghnh khoa hc my tnh rt pht trin, n c ng dng rt nhiu

    trong cc lnh vc khc nhau ca cuc sng nh: Gio dc, Y t, Kinh t, Khoa hc,Xy dng, N tr thnh mt phn khng th thiu c trong cuc sng hngngy ca con ngi.Vic dng cc phng tin tin hc t chc v khai thc ccc s d liu c pht trin t nhng nm 60. c bit trong nhng nm gny vai tr ca my tnh trong vic lu tr v x l thng tin ngy cng tr ln quantrng. Bn cnh cc thit b thu thp d liu t ng tng i pht trin tora nhng kho d liu khng l. Vi s pht trin mnh m ca cng ngh in tto ra cc b nh c dung lng ln, b x l tc cao cng vi cc h thngmng vin thng, ngi ta xy dng cc h thng thng tin nhm t ng ho

    mi hot ng kinh doanh ca mnh. iu ny to ra mt dng d liu tng lnkhng ngng v ngay t cc cc giao dch n gian nht nh mt cuc in thoi,kim tra sc khe, s dng th tn dng, v.v.u c ghi vo trong my tnh. Choti nay con s ny tr ln khng l, bao gm cc c s d liu, thng tin khchhng, d liu lch s cc giao dch, d liu bn hng, d liu cc ti khon vay, sdng vn,..Vn t ra l lm th no s l khi lng thng tin cc ln nhvy pht hin ra cc tri thc tim n trong n.

    lm c iu ngi ta s dng qu trnh Pht hin tri thc trong

    c s d liu( Knowledge Discovery in Database-KDD). Nhim v ca KDD l td liu sn c phi tm ra nhng thng tin tim n c gi tr m trc cha c

    pht hin cng nh tm ra nhng xu hng pht trin v cc xu hng tc ng lnchng .Cc k thut cho php ta ly c cc tri thc t c s d liu sn c c gi l k thut Khai ph d liu( Data Mining).

    T nhng l do chng em hiu v tiKhai ph d liu bng lutkt hp. Nhm phn tch cc d liu v s dng cc k thut tm ra nhng muthng tin, hot ng c tnh chnh quy trong tp d liu m ngi s dng mong

    mun, ng thi p dng vo bi ton Qun l bn hng ti siu th.Trong qu trnh lm n hon thnh ti ny chng nhn c s

    gip ch bo tn tnh ca cc thy c gio trong khoa cng ngh thng tin v ccbn trong lp, c bit l thy gio Trn Hng Cng. Nhng do thi gian c giihn v nng lc cn hn ch nn khng trnh khi nhng sai st, chng em mongnhn c s gp hn na ca thy c v cc bn.

    Chng em cng xin chn thnh cm n cc thy gio, c gio trong khoaCng Ngh Thng Tin to iu kin gip chng em trong xut thi gian lm

    n v hc tp ti trng.

    2

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    3/67

    Chng em xin chn thnh cm n cc bn cng lp to iu kin chochng em hon thnh tt lun vn ny.

    Chng em xin chn thnh cm n!

    Nhm sinh vin thc hin:

    Phm Th Hon

    Trn Vit Phng ng

    Lp C-H-KHMT3-K1

    3

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    4/67

    TM TT NNi dung ca n l nhng kin thc v khai ph d liu s dng lut kt

    hp, cc thut ton kinh in trong qu trnh s dng lut kt hp, cch p dng

    thut ton Apriori vo mt phn nh trong bi ton Qun l bn hng ti siu th .Mc ch ca n l:

    Phn tch cc d liu v s dng cc k thut tm ra nhng mu thngtin, hot ng c tnh chnh quy trong tp d liu m ngi s dng mong mun.

    a ra cc thut ton c bn nh Apriori, thut ton tm lut kt hp khngpht sinh ng vin da vo cu trc cy FP- Tree, v.v.trong vic s dng lut kthp phn tch mt c s d liu no .

    Phn tch c s d liu v ci t thut ton Apriori p dng mt phn

    nh vo bi ton Qun l bn hng ti siu th . n bao gm c 3 chng, vi cc ni dung nh sau:

    Chng I: Tng quan v khai ph d liu. Nidung trong chng ny sc trnh by bao gm:Khai ph d liu v pht hin tri thc, qu trnh pht hintri thc t c s d liu, khai ph d liu c li ch g? Cc k thut khai ph dliu, nhim v chnh ca khai ph d liu, cc phng php khai ph d liu, ngdng ca khai ph d liu v mt s thch thc t ra cho vic khai ph d liu.

    Chng II: Tp ph bin v lut kt hp: Ni dung uc trnh by bao

    gm: Mt s khi nim, tnh cht c bn ca tp ph bin v lut kt hp, tm tpph bin, mt s thut ton c bn v lut kt hp, mt s v d minh ha cc thutton.

    Chng III: Cch ci t v th nghim thut ton tm tp ph bin vlut kt hp: Phn tch mt c s d liu, trnh by v cch ci t chng trnhkhai thc lut kt hp trong vic qun l bn hng ti siu th. Da vo kt qu nym ngi qun l bn hng ti th siu nm bt c nhng nhm mt hng no clin quan ti nhau, phc v cho mc ch qun l v la chn cc mt hng kinh

    doanh.

    4

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    5/67

    SUMMARY OF THE PROJECTThis projects content is the knowledge of data mining which uses

    association rules, the classical algorithms in the proccess of using association rules,

    how to apply Apriori Algorithms to a small part on Sales Management Problem insupermarket.

    The purposes of this project are:

    Analysing data and using technique to find out sample informations,actions which have regular nature in data files that users want.

    Bringing out the classical algorithms such as Apriori, the algorithms offinding association rules without arising subsets (candidates) which base on FP-Tree Structure...etc in using association rules to analyse any database.

    Analysing database and installing Apriori Algorithms to apply partly toSales Management Task in supermarket.

    The project has 3 chapters, with main content as follows:

    Chapter I: Overview of data mining. The contents of this chapter whichwill be presented consist of: Data Mining and Knowledge Discovery in database,the advantages of data mining? Techniques of data mining, main task of datamining, methods of data mining, application of data mining and some challengeswhich are set up for data mining.

    Chapter II: Frequent- Itemset and Association Rules. This chapterscontent includes in: some concepts, basic property of Frequent- Itemset andAssociation Rules, searching for Frequent- Itemset, some basic algorithms ofAssociation Rules, some examples which illustrates algorithms.

    Chapter III: How to install and test The Algorithms of finding FrequentItemset and Association Rules. They are: Analysing one database, presenting theway to install program Exploiting Frequent Itemset in Sales Management insupermarket. Sales Manager bases on this result to know gather of related product

    to statisfy the purpose of management and choice products to do bussiness.

    5

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    6/67

    MC LCNHN XT CA GING VIN HNG DN ................................................................ 1

    LI NI U ....................................................................................................................... 2

    TM TT N ................................................................................................................. 4SUMMARY OF THE PROJECT ......................................................................................... 5

    DANH SCH BNG BIU ................................................................................................. 9

    DANH SCH CC T VIT TT ....................................................................................10

    ..............................................................................................................................................11

    M U .............................................................................................................................. 12

    Chng I: TNG QUAN V KHAI PHI D LIU ......................................................13

    1.1. t vn ...........................................................................................................................13

    1.2. Khai ph d liu v pht hin tri thc ..............................................................................14

    1.3. Qu trnh pht hin tri thc t c s d liu .............................................. .....................141.3.1. Xc nh bi ton ..................................................................................................................... ......151.3.2. Thu thp v tin x l ................................................................................................................ ....15

    1.3.2.1. Gom d liu ............................................................................................................................161.3.2.2. Chn lc d liu .....................................................................................................................161.3.2.3. Lm sch ......................................................................................................................... .......16

    1.3.2.4. Lm giu d liu .......................................................................................................... ........ ..171.3.2.5. M ho d liu ........................................................................................................................171.3.2.6. nh gi v trnh din ............................................................................................................17

    1.3.3. Khai ph d liu .......................................................................................................................... ...181.3.4. Pht biu v nh gi kt qu .........................................................................................................181.3.5. S dng tri thc pht hin ....................................................................................................... ..18

    1.4. Khai ph d liu c nhng li ch g .................................................................................18

    1.5. Cc k thut khai ph d liu ...........................................................................................191.5.1. K thut khai ph d liu m t .....................................................................................................191.5.2. K thut khai ph d liu d on .................................................................................................19

    1.6. Nhim v chnh ca khai ph d liu ...............................................................................191.6.1. Phn lp (Classification) ................................................................................................................201.6.2. Hi quy (Regression) .................................................................................................................. ...201.6.3. Gom nhm (Clustering) .............................................................................................................. ...201.6.4. Tng hp (Summarization) ...........................................................................................................201.6.5. M hnh rng buc (Dependency modeling) ........................................................................ .........201.6.6. D tm bin i v lch (Change and Deviation Dectection) .......................................... .........21

    1.7. Cc phng php khai ph d liu ...................................................................................211.7.1. Cc thnh phn ca gii thut khai ph d liu .................................................................. ........ ...211.7.2. Mt s phng php khai thc d liu ph bin ............................................................................22

    1.7.2.1. Phng php quy np (Induction)..........................................................................................221.7.2.2. Cy quyt nh v lut ............................................................................................................221.7.2.3. Pht hin cc lut kt hp .......................................................................................................22

    1.7.2.4. Mng Neuron .............................................................................................................. ........ ...231.7.2.5. Gii thut di truyn .................................................................................................................24

    6

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    7/67

    1.8. ng dng ca khai ph d liu .................................................................................... .....24

    1.9. Mt s thch thc t ra cho vic khai ph d liu .........................................................25

    Chng II: TP PH BIN V LUT KT HP ..........................................................27

    2.1. M u ................................................................................................................................27

    2.2. Cc khi nim c bn .........................................................................................................272.2.1. nh ngha 2. 2.1: Ng cnh khai ph d liu ......................................................................... ......272.2.2. nh ngha 2. 2. 2: Cc kt ni Galois ...........................................................................................272.2.3. nh ngha 2.2.3: h tr (Support) ...........................................................................................272.2.4. nh ngha 2 2.4: tin cy ( Confidence) ...................................................................................28

    2.2.4.1. Tnh cht 2. 2.4.1: H tr ca tp con....................................................................................282.2.4.2. Tnh cht 2.2.4.2 .....................................................................................................................282.2.4.3. Tnh cht 2.2.4.3 .....................................................................................................................282.2.4.4. Tnh cht 2. 2.4.4 ....................................................................................................................28

    2.2.5. nh ngha 2.2.5: Tp mt hng ph bin ......................................................................................292.2.6. nh ngha 2.2.6: Lut kt hp ............................................................................................... .......29

    2.2.6.1. Tnh cht 2.2.6.1: Lut kt hp khng c hp thnh............................................................. 29

    2.2.6.2. Tnh cht 2.2.6.2: Lut kt hp khng c tnh tch................................................................292.2.6.3. Tnh cht 2.2.6.3: Lut kt hp khng c tnh bc cu.......................................................... 302.2.6.4. Tnh cht 2.2.6.4 ....................................................................................................................30

    2.3. Tm tp ph bin ................................................................................................................302.3.1. Mt s khi nim ............................................................................................................................302.3.2. Thut ton Apriori ..........................................................................................................................31

    2.3.2.1. M t thut ton .....................................................................................................................312.3.2.2. V d minh ho cho thut ton Apriori ........................................................................ ........ ..332.3.2.3. Procedure-Code.....................................................................................................................342.3.2.4. To tp ng vin (k+1)- hng mc.........................................................................................35

    2.4. Tm lut kt hp .................................................................................................................362.4.1. Pht biu bi ton khai ph lut kt hp .................................................................................... ....36

    2.4.2. Pht trin gii php hiu qu trong khai thc lut kt hp .............................................................382.5. Quy trnh khai thc lut kt hp .......................................................................................40

    2.6. Mt s thut ton khc ......................................................................................................402.6.1. Thut ton khai ph song song cho lut kt hp m ......................................................................402.6.2. Thut ton FP-Growth .......................................................................................................... .........42

    2.6.2.1 Bn cht...................................................................................................................................422.6.2.2. Qui trnh................................................................................................................................. 422.6.2.3. Thut ton FP_Growth ...........................................................................................................51

    Chng III: CI T V TH NGHIM THUT TON TM TP PH BIN VLUT KT HP ................................................................................................................. 52

    3.1. Pht biu bi ton............................................................................................................... 523.2. La chn thut ton ci t phn mm....................................................................... 52

    3.3. Yu cu khi ci t thut ton........................................................................................... 52

    3.4. C s d liu....................................................................................................................... 533.4.1. Giao din chnh ca c s d liu..................................................................................................533.4.2. Bng danh mc cc Nh cung cp hng ha..................................................................................543.4.3. Bng danh mc cc Hng Ho.......................................................................................................553.4.4. Bng danh mc cc Khch Hng...................................................................................................563.4.5. Bng danh mc cc Ho n.........................................................................................................573.4.6. Bng danh mc chi tit Ho n...................................................................................................583.4.7. Ghi XML........................................................................................................................................59

    3.5. Giao din chnh chng trnh............................................................................................ 59

    7

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    8/67

    3.6. Kt ni d liu.................................................................................................................... 60

    3.7. Thm d liu Xml ..............................................................................................................60

    3.8. Kt qu phn tch ...............................................................................................................61

    3.9. Kt qu lc MinSup = 10 ...................................................................................................61

    3.10. Kt qu lc MinCon = 40% .............................................................................................62

    KT LUN CHUNG .......................................................................................................... 63

    HNG PHT TRIN TI ........................................................................................64

    TI LIU THAM KHO ................................................................................................... 65

    BNG I CHIU THUT NG VIT - ANH ............................................................. 66

    DANH SCH HNH VHnh 1.1. Qu trnh pht hin tri thc t c s d liu ....................................................14

    Hnh 1.2. Qu trnh pht hin tri thc ..............................................................................15

    Hnh 1.3: M hnh li ch ca khai ph d liu ................................................................19

    Hnh 1.4.Th hin s khai ph d liu bng mng Neunon....................................... 24

    Hnh 2.5. Minh ha lut kt hp khng c tnh tch ........................................................30

    Hnh 3.1. Giao din chnh ca c s d liu ..................................................................... 53

    Hnh 3.2. Danh mc nh cung cp ....................................................................................54

    Hnh 3.3. Danh mc hng ha ...........................................................................................55

    Hinh 3.4.Danh mc khch hng ........................................................................................ 56

    Hnh 3.5. Danh mc ha n ............................................................................................. 57

    Hnh 3.6. Danh mc chi tit ha n .................................................................................58Hnh 3.7. Ghi XML ............................................................................................................. 59

    8

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    9/67

    Hnh 3.8. Giao din chnh ca chng trnh ....................................................................59

    Hnh 3.9. Kt ni d liu .................................................................................................... 60

    Hnh 3.10. Thm d liu XML .........................................................................................60

    Hnh 3.11. Kt qu phn tch ............................................................................................. 61

    Hnh 3.12. Kt qu lc ph bin ti thiu .....................................................................61

    Hnh 3.13. Kt qu lc tin cy ....................................................................................... 62

    DANH SCH BNG BIUBng 2.1. CSDL s dng minh ho thut ton Apriori .................................................... 33

    Bng 2. 2. Kt qu thc hin thut ton Aprori cho CSDL D .......................................... 34

    Bng 2. 3. V d v mt CSDL giao dch D .....................................................................37

    Bng 2.4. Tp mc thng xuyn Minsup = 50% .............................................................37

    Bng 2.5. Lut kt hp sinh t tp mc ph bin ABE .................................................... 38

    Bng 2.6. Cy FP ................................................................................................................ 43

    Bng 2.7. Cy FP ................................................................................................................ 43

    Bng 2.8. Cy FP ................................................................................................................ 44

    Bng 2.9. Cy FP ................................................................................................................ 45

    Bng 2.10. Cy FP .............................................................................................................. 46

    Bng 2.11. Cy FP .............................................................................................................. 48

    Bng 2.12. Cy FP .............................................................................................................. 48Bng 2.13. Cy FP .............................................................................................................. 49

    9

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    10/67

    Bng 2.14.C s d liu ......................................................................................................50

    DANH SCH CC T VIT TT

    T vit tt Din gii

    KDD Pht hin tri thc trong c s d liu

    DL D liu

    CSDL C s d liu

    KPDL Khai ph d liuNCKPDL Ng cnh khai ph d liu

    LKH Lut kt hp

    10

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    11/67

    11

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    12/67

    M US pht trin ca cng ngh thng tin v vic ng dng cng ngh thng tin

    trong nhiu lnh vc ca i sng, kinh t x hi trong nhiu nm qua cng ng

    ngha vi lng d liu c cc c quan thu thp v lu tr ngy mt nhiu ln.H lu tr cc d liu ny v cho rng trong n n cha nhng gi tr nht nh no. Tuy nhin, theo thng k th ch c mt lng nh ca nhng d liu ny(khong t 5% n 10%) l lun c phn tch, s cn li h khng bit s philm g hoc c th lm g vi chng nhng h vn tip tc thu thp rt tn km vi ngh lo s rng s c ci g quan trng b b qua sau ny c lc cn n n.Cc phng php qun tr v khai thc c s d liu truyn thng khng p ngc k vng ny, nn ra i K thut pht hin tri thc v khai ph d liu(KDD - Knowledge Discovery and Data Mining).

    K thut pht hin tri thc v khai ph d liu v ang c nghin cu,ng dng trong nhiu lnh vc khc nhau cc nc trn th gii, ti Vit Nam kthut ny tng i cn mi m tuy nhin cng ang c nghin cu v dn avo ng dng.

    Hin nay c rt nhiu phng php kinh doanh cng nh c rt nhiuphn mm qun l vic kinh doanh . V d nh phn mm qun l bn hngti th siu bng Fox, C#, VB,...Tuy nhin ti ny chng em khng xy dng mt

    phn mm qun l bn hng ti th siu hon chnh m ch tm hiu v ci t mtkha cnh nh trong bi ton Qun l bn hng ti siu th . l phn tch d liu

    bng lut kt hp trong qu trnh tm hiu cc mt hng c lin quan ti nhau nhth no? Gip cho nh qun l tm hiu, phn tch la chn cc mt hng kinhdoanh tt hn.

    Trong phm vi ca ti nghin cu ny, chng em xin c trnh by:

    Nhng kin thc v khai ph d liu s dng lut kt hp. y l dng lutkt hp tng i n gin nhng tnh hiu qu cao, gip tm ra c nhng lutqu him.

    a ra cc nh ngha, tnh cht v mt s thut ton c bn thng c pdng trong qu trnh tm lut kt hp ca mt c s d liu.

    Phn tch v ci t thut ton Apriori p dng vo mt phn nh trong biton Qun l bn hng ti siu th .

    12

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    13/67

    Chng I: TNG QUAN V KHAI PHI D LIU1.1. t vn

    Trong k nguyn Internet, Intranets, Warehouses, m ra nhiu c hi cho

    nhng nh doanh nghip trong vic thu thp v x l thng tin. Hn na, cc cngngh lu tr v phc hi d liu pht trin mt cch nhanh chng v th c s dliu cc c quan, doanh nghip, n v ngy cng nhiu thng tin tim n phong

    ph v a dng.

    C s d liu trong cc doanh nghip th d liu giao dch ng mt vai trrt quan trng cho vic hoch nh k hoch kinh doanh trn thng trng vonhng nm tip theo. Hin ti, vic s dng cc d liu ny tuy t c mt skt qu nht nh song vn cn mt s vn tn ng nh:

    - Da hon ton vo d liu, khng s dng tri thc c sn v lnh vc, ktqu phn tch kh c th lm r c.

    - Phi c s hng dn ca ngi dng xc nh phn tch d liu nh thno v u.

    Trong iu kin v yu cu ca x hi, i hi phi c nhng phng phpnhanh, ph hp, t ng, chnh xc v c hiu qu ly c thng tin c gi tr.Cc tri thc chit xut c t c s d liu trn s l mt ngun ti liu h tr cholnh o trong vic ln k hoch hot ng hoc trong vic ra quyt nh sn xut

    kinh doanh. V vy, tnh ng dng ca khai ph d liu bng lut kt hp t c sd liu giao dch l mt vn ang c quan tm c bit trong x hi hin nay.

    Mc ch ca vic nghin cu l xy dng mt gii php hiu qu tnh ngdng lut kt hp trong vic ra quyt nh ca c quan doanh nghip da trn c sd liu giao dch.

    S pht trin nhanh chng cc ng dng cng ngh thng tin v Internet vonhiu lnh vc i sng x hi, qun l kinh t, khoa hc k thut,... to ra nhiuc s d liu khng l v d nh c s d liu bn hng ca mt siu th cha hng

    nghn giao tc bn hng; hay c s d liu ca mt h thng thng tin v khchhng trong mt ngn hng,... khai ph hiu qu ngun thng tin t cc c s dliu ln h tr tin trnh ra quyt nh, bn cnh cc phng php khai thc thngtin truyn thng, cc nh nghin cu pht trin cc phng php, k thut v

    phn mm mi h tr tin trnh khai ph, phn tch tng hp thng tin.

    C rt nhiu k thut khai ph d liu khc nhau tun theo cc bc qutrnh pht hin tri thc, gii quyt cc nhim v khai ph d liu. Sau ychng em s ln lt trnh by nhng vn nu ra.

    13

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    14/67

    1.2. Khai ph d liu v pht hin tri thc

    Yu t thnh cng trong mi hot ng kinh doanh ngy nay l vic bit sdng thng tin c hiu qu. iu c ngha l t cc d liu c sn phi tm ranhng thng tin tim n m trc cha c pht hin, tm ra nhng xu hng

    pht trin v nhng yu t tc ng ln chng. Thc hin cng vic chnh l qutrnh pht hin tri thc trong c s d liu m trong k thut cho php ta lyc cc tri thc chnh ra t k thut khai ph d liu.

    Nu quan nim tri thc l mi quan h ca cc mu gia cc phn t d liuth qu trnh pht hin tri thc ch ton b qu trnh trit xut tri thc t c s dliu, trong tri qua nhiu giai on khc nhau nh: Tm hiu v pht hin vn ,thu thp v tin x l d liu, pht hin tri thc, minh ho v nh gi tri thc

    pht hin v a kt qu vo thc t.

    Khai ph d liu c nhng im khc nhau v mt ng ngha so vi phthin tri thc t c s d liu nhng thc t ta thy khai ph d liu l ch mtgiai on pht hin tri thc trong mt chui cc giai on qu trnh pht hin trithc trong c s d liu. Tuy nhin y l giai on ng vai tr ch cht v lgiai on chnh to nn tnh a ngnh ca pht hin tri thc trong c s d liu.

    1.3. Qu trnh pht hin tri thc t c s d liu

    Pht hin tri thc t c s d liu l mt qu trnh c s dng nhiu phngphp v cng c tin hc nhng vn l mt qu trnh m trong con ngi lm

    trung tm. Do n khng phi l mt h thng phn tch t ng m l mt hthng bao gm nhiu hot ng tng tc thng xuyn gia con ngi v c s dliu, tt nhin l vi s h tr ca cc cng c tin hc.

    Xc nh bi ton

    Thu thp v tin x l dliu

    Khai ph d liu trit xuttri thc

    Pht biu kt qu v nhgi trit xut tri thc

    S dng tri thc v phthin c trit xut tri thc

    Hnh 1.1. Qu trnh pht hin tri thc t c s d liu

    Mc d c 5 giai on nh trn( hnh 1.1) xong qu trnh pht hin tri thc t cs d liu l 1 qu trnh tng tc v lp i lp li theo kiu xon chn c, trong

    14

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    15/67

    ln lp sau hon chnh hn ln lp trc. Ngoi ra giai on sau li da trn kt quthu c ca giai on trc theo kiu thc nc. y l mt qu trnh bin trngmang tnh cht hc ca qu trnh pht hin tr thc v l phng php lun trongvin pht hin tri thc. Cc giai on s c trnh by c th nh sau:

    1.3.1. Xc nh bi ton

    y l mt qu trnh mang tnh nh hnh vi mc ch xc nh c lnhvc yu cu pht hin tri thc v xy dng bi ton tng kt. Trong thc t cc cs d liu c chuyn mn ho v phn chia theo cc lnh vc khc nhau nh: Sn

    phm, kinh doanh, ti chnh, v.v.Vi mi tri thc pht hin c c th c gi trtrong lnh vc ny nhng li khng mang nhiu ngha vi mt lnh vc khc. Vvy vic xc nh lnh vc v nh ngha bi ton gip nh hng cho giai ontip theo thu thp v tin x l d liu.

    1.3.2. Thu thp v tin x l

    Cc c s d liu thu c thng cha rt nhiu thuc tnh nhng li khngy , khng thun nht, c nhiu li v cc gi tr c bit. V vy giai on thuthp v tin x l d liu tr nn rt quan trng trong qu trnh pht hin tri thc tc s d liu. C th ni giai on ny chim t 70%-80% gi thnh trong ton b

    bi ton.

    Ngi ta chia giai on v tin x l d liu nh: Gom d liu, chn d liu,lm sch, m ho d liu, lm giu, nh gi v trnh din d liu. Cc cng on

    ny c thc hin theo trnh t nht nh c th nh sau:

    Knowledge

    Pattern

    DiscoveryTransforme

    CleansedPreprocessed

    Preparated

    Data

    Target

    Gom DL

    M ho DL

    Chn lc DL

    Lm giu DL

    Lm sch DL

    nh gi & trnh din

    Internet,..

    Hnh 1.2. Qu trnh pht hin tri thc

    15

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    16/67

    1.3.2.1. Gom d liu

    Tp hp d liu l bc u tin trong qu trnh khai ph d liu. y lbc c khai thc trong mt c s d liu, mt kho d liu v thm ch cc dliu t cc ngun ng dng Web.

    1.3.2.2. Chn lc d liu

    giai on ny d liu c la chn hoc phn chia theo mt s tiu chunno . y l giai on chn lc, trch rt cc d liu cn thit t c s d liu tcnghip vo mt c s d liu ring. Chng ta chn ra nhng d liu cn thit chocc giai on sau. Tuy nhin cng vic thu gom d liu vo mt c s d liuthng rt kho khn v d liu nm ri rc khp ni trong c quan, t chc cngmt loi thng tin, nhng c to lp theo cc dng hnh thc khc nhau. V dni ny dng kiu chui, ni kia li dng kiu s khai bo mt thuc tnh no

    ca khch hng. ng thi cht lng d liu ca cc ni cng khng ging nhau.V vy chng ta cn chn lc d liu tht tt chuyn sang giai on tip theo

    1.3.2.3. Lm sch

    Giai oan th ba ny l giai on hay b sao lng, nhng thc t n l mtbc rt quan trng trong qu trnh khai ph d liu. Mt s li thng mc phitrong khi gom d liu l tnh khng cht ch, logc. V vy, d liu thng chacc gi tr v ngha v khng c kh nng kt ni d liu. Giai on ny s tinhnh x l nhng dng d liu khng cht ch ni trn. Nhng d liu dng ny

    c xem nh thng tin d tha, khng c gi tr. Bi vy, y l mt qu trnh rtquan trng v d liu ny nu khng c lm sch - tin x l - chun b trcth s gy nn nhng kt qu sai lch nghim trng.

    Giai on ny thc hin mt s chc nng sau:

    - iu ho d liu: Cng vic ny nhm gim bt tnh khng nht qun dliu ly t nhiu ngun khc nhau. Phng php thng thng l kh cc trnghp trng lp d liu v thng nht cc k hiu. V d mt khch hng c th cnhiu bn ghi do vic nhp sai tn hoc do qu trnh thay i mt s thng tin c

    nhn gy ra v to ra s nhm ln l c nhiu khch hng.

    - X l cc gi tr khuyt: Tnh khng y ca d liu c th gy rahin tng d liu cha cc gi tr khuyt. y l hin tng kh ph bin.

    Ngi ta s dng nhiu phng php khc nhau x l cc gi tr khuyt nh:B qua cc b c gi tr khuyt, im b sung bng tay, dng mt hng chung

    b sung vo gi tr khuyt, dng gi tr trung bnh ca mi bn ghi trn thuctinh khuyt, dng gi tr trung bnh ca mi bn ghi cng lp hoc dng cc gitr m tn sut xut hin ln nht.

    16

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    17/67

    - X l nhiu v cc ngoi l: Thng thng nhiu d liu c th l nhiungu nhin hoc cc gi tr bt bnh thng. lm sch nhiu, ngi ta c ths dng phng php lm trn nhiu hoc dng cc gii thut pht hin ra ccngoi l x l.

    1.3.2.4. Lm giu d liu

    Mc ch ca giai on ny l b sung thm nhiu loi thng tin c linquan vo c s d liu gc. lm c iu ny, chng ta phi c cc c s dliu khc bn ngoi c lin quan ti c s d liu gc ban u. Ta tin hnh bsung nhng thng tin cn thit, lm tng kh nng khm ph tri thc.

    y l bc mang tnh t duy trong khai ph d liu. giai on nynhiu thut ton khc nhau c s dng trch ra cc mu t d liu. Thutton thng dng l nguyn tc phn loi, nguyn tc kt hp hoc cc m hnh

    d liu tun t, v. v.

    Qu trnh lm giu bao gm vic tch hp v chuyn i d liu. Cc dliu t nhiu ngun khc nhau c tch hp thnh mt kho thng nht. Cckhun dng khc nhau ca d liu cng c quy i, tnh ton li a vmt kiu thng nht, tin cho qu trnh phn tch.

    1.3.2.5. M ho d liu

    Tip theo l giai on chuyn i d liu, d liu a ra c th s dng viu khin c bi vic t chc li n. D liu c chuyn i ph hp vimc ch khai thc. Mc ch ca giai on ny l chuyn i kiu d liu v nhngdng thut tin tin hnh cc thut ton khm ph d liu. C nhiu cch m hod liu nh:

    - Phn vng: D liu l gi tr chui, nm trong cc tp cc chui c inh.

    - Bin i gi tr nm thnh con s nguyn l s nm tri qua so vi nmhin hnh.

    - Chia gi tr s theo mt h s tp cc gi tr nm trong vng nh hn.

    - Chuyn i Yes-No thnh 0-1.1.3.2.6. nh gi v trnh din

    y l giai on cui trong qu trnh khai ph d liu. giai on ny, ccmu d liu c chit xut ra bi phn mm khai ph d liu. Khng phi bt cmu d liu no cng u hu ch, i khi n cn b sai lch. V vy, cn phi utin nhng tiu chun nh gi chit xut ra cc tri thc cn chit xut ra.

    Trn y l 6 giai on trong qu trnh khai ph d liu.

    17

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    18/67

    1.3.3. Khai ph d liu

    Giai on khai thc d liu c bt u sau khi d liu c thu thp vtin hnh x l. Trong giai on ny, cng vic ch yu l xc nh c bi tonkhai ph d liu, tin hnh la chn cc phng php khai thc ph hp vi d liu

    c c v tch ta cc tri thc cn thit.

    L giai on thit yu, trong cc phng php thng minh s c pdng trch xut ra cc mu d liu.

    1.3.4. Pht biu v nh gi kt qu

    Cc tri thc pht hin t c s d liu cn c tng hp di dng cc boco phc v cho cc mc ch h tr cc quyt nh khc nhau.

    Do nhiu phng php khai thc c th c p dng nn cc kt qu c mc

    tt, xu khc nhau. Vic nh gi cc kt qu thu c l cn thit, Cc tri thc phthin t c s d liu cn c tng hp di dng cc bo co phc v cho cc mcch h tr cc quyt nh khc nhau.

    Do nhiu phng php khai thc c th c p dng nn cc kt qu c mc tt, xu khc nhau. Vic nh gi cc kt qu thu c l cn thit, gip to c scho nhng quyt nh chin lc. Thng thng, chng c tng hp, so snh bngcc biu v c kim nghim, tin hoc.

    1.3.5. S dng tri thc pht hin

    Cng c, tinh ch cc tri thc c pht hin. Kt hp cc tri thc thnhh thng. Gii quyt cc xung t tim tng trong tri thc khai thc c. Sau trithc c chun b sn sng cho ng dng.

    Cc kt qu ca qu trnh pht hin tri thc c th c a vo ng dngtrong nhng lnh vc khc nhau. Do cc kt qu c th l cc d bo hoc cc m tnn chng c th c a vo cc h thng h tr ra quyt nh nhm t ng hoqu trnh ny.

    1.4. Khai ph d liu c nhng li ch g

    - Cung cp tri thc h tr ra quyt nh.- D bo.

    - Khi qut d liu.

    Hnh 1.3 L mt m hnh th hin li ch ca KPDL trong vic phn tch vra quyt nh cho vic ra tip th ca mt loi sn phm no

    18

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    19/67

    Tip th

    CSDL

    Tip th

    KDD &Data MiningNh kho d liu

    Hnh 1.3: M hnh li ch ca khai ph d liu

    1.5. Cc k thut khai ph d liu

    K thut khai ph d liu thng c chia lm 2 nhm chnh:

    1.5.1. K thut khai ph d liu m t

    C nhim v m t v cc tnh cht hoc cc c tnh chung ca d liutrong CSDL hin c. Cc k thut ny gm c: Phn cm (clustering), tm tt(summerization), trc quan ho (visualiztion), phn tch s pht trin v lch(Evolution and deviation analyst), phn tch lut kt hp (association rules).v.v.

    1.5.2. K thut khai ph d liu d on

    C nhim v a ra cc d on da vo cc suy din trn d liu hin thi.Cc k thut ny gm c: Phn lp (classification), hi quy (regression)

    1.6. Nhim v chnh ca khai ph d liu

    R rng rng mc ch ca khai ph d liu l cc tri thc chit xut c sc s dng cho li ch cnh tranh trn thng trng v cc li ch trong nghincu khoa hc.

    Do , ta c th coi mc ch chnh ca khai thc d liu s l m t v don. Cc mu m khai ph d liu pht hin c nhm vo mc ch ny.

    D onlin quan n vic s dng cc bin hoc cc trng trong c s dliu chit xut ra cc mu l cc d on nhng gi tr cha bit hoc nhng gitr trong tng lai ca cc bin ng quan tm.

    M t tp trung vo vic tm kim cc mu m t d liu m con ngi cth hiu c.

    t c hai mc ch ny, nhim v chnh ca khai ph d liu l:

    - Phn lp (Classification).

    - Hi qui (Regression).- Gom nhm (Clustering).

    19

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    20/67

    - Tng hp (Summarization).

    - M hnh rng buc (Dependency modeling).

    - D tm bin i v lch (Change and Deviation Dectection).

    1.6.1. Phn lp (Classification)

    Phn lp l vic phn loi mt mu d liu vo mt trong s cc lp xcnh.

    Mc tiu ca thut ton phn lp l tm ra cc mi quan h no gia ccthuc tnh d bo v thuc tnh phn lp, t s dng mi quan h ny d bolp cho cc b d liu mi khc cng khung dng.

    1.6.2. Hi quy (Regression)

    Hi quy l vic l c mt hm nh x t mt mu d liu thnh mt bin d

    on c gi tr thc. C rt nhiu ng dng khai ph d liu vi nhim v hi quy,v d nh bit cc php o vi sng t xa, nh gi kh nng t vong ca bnh nhn

    bit cc kt qu xt nghim chn on, d on nhu cu tiu th mt sn phm mibng mt hm ch tiu qung co, v. v.

    1.6.3. Gom nhm (Clustering)

    L vic m t chung tm ra cc tp xc nh cc nhm hay cc loi m td liu. Cc nhm c th tch ring nhau hoc phn cp hoc gi ln nhau. C ngha lmt d liu c th va thuc nhm ny, va thuc nhm kia. Cc ng dng khai ph

    d liu c nhim v gom nhm nh: Pht hin tp cc khch hng c phn ng gingnhau trong c s d liu tip th, xc nh cc loi quang ph t cc phng php otia hng ngoi.

    1.6.4. Tng hp (Summarization)

    Nhim v tng hp l vic sn sinh ra cc m t c trng cho mt lp. Ccm t ny l mt kiu tng hp, tm tt m t cc c tnh chung ca tt c cc bd liu dng gi mua hng thuc mt lp.

    Cc m t c trng th hin di dng cc lut thng c khun dng:

    Nu mt b d liu thuc v mt lp ch ra trong tin , th b d liu c ttc cc thuc tnh nu trong kt lun. Nhng lut ny c nhng c trng khc

    bit so vi cc lut phn lp. Lut pht hin c trng cho mt lp ch c snsinh khi cc b d liu thuc v lp .

    1.6.5. M hnh rng buc (Dependency modeling)

    Bao gm vic tm kim mt m hnh m t s ph thuc ng k gia ccbin. Cc m hnh ph thuc tn ti di hai mc: Mc cu trc ca m hnh xcnh cc bin no l ph thuc cc b vi nhau, mc nh lng ca mt m hnh

    xc nh mnh ca s ph thuc theo mt thc o no .

    20

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    21/67

    1.6.6. D tm bin i v lch (Change and Deviation Dectection)

    Tp trung vo khai thc nhng thay i ng k nht trong d liu t ccgi tr chun hoc c o trc .

    V cc nhim v khc nhau ny yu cu s lng v cc dng thng tin rtkhc nhau nn chng thng nh hng n vic thit k v chn gii thut khai

    ph d liu khc nhau. V d nh gii thut to cy quyt nh to ra c mt mt phn bit c cc mu gia cc lp nhng khng c cc tnh cht v c imca lp.

    1.7. Cc phng php khai ph d liu

    Qu trnh khai ph d liu l qu trnh pht hin mu, trong , gii thutkhai ph d liu tm kim cc mu ng quan tm theo dng xc nh nh cc lut,cy phn lp, hi quy, gom nhm, v. v.

    1.7.1. Cc thnh phn ca gii thut khai ph d liu

    Gii thut khai ph d liu bao gm 3 thnh phn chnh nh sau: biudin m hnh, nh gi m hnh, tm kim m hnh.

    Biu din m hnh: M hnh c biu din bng mt ngn ng L m t cc mu c th khai thc c. Tc l ngi phn tch d liu cn phi hiuy cc gi thit m t v cn phi din t c cc gi thit m t no c tora bi gii thut. M hnh s c nh gi bng cch a cc d liu th vo

    m hnh v thay i li cc tham s cho ph hp nu cn. nh gi m hnh: nh gi xem mt mu c p ng c cc tiu

    chun ca qu trnh pht hin tri thc hay khng. Vic nh gi chnh xc don da trn nh gi cho (Cross Validation). nh gi cht lng m t linquan n chnh xc d on, mi, kh nng s dng, kh nng hiu c cam hnh. C hai chun thng k v chun logic u c th c s dng nh gim hnh.

    Phng php tm kim: Phng php tm kim bao gm hai thnh

    phn: tm kim tham s v tm kim m hnh.- Tm kim tham s: ti u ha cc tiu chun nh gi m hnh vi

    cc d liu quan st c v vi mt m t m hnh nh.

    - Tm kim m hnh: Xy ra ging nh mt vng lp qua phng phptm kim tham s: M t m hnh b thay i to nn mt h cc m hnh.

    = > Vi mi mt m t m hnh, phng php tm kim tham s c pdng nh gi cht lng m hnh. Cc phng php tm kim m hnh thngs dng cc k thut tm kim heuristic v kch thc ca khng gian cc m hnh

    c th thng ngn cn cc tm kim tng th, hn na cc gii php n ginkhng d t c.

    21

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    22/67

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    23/67

    V d ch c s t ngi mua sch ting anh m mua thm a CD. S lng cc lutkt hp trong mt s c s d liu ln gn nh v hn. Do vy thut ton s khngth pht hin ht cc lut v khng phn bit c lut no l thng tin thc s cgi tr v th v.

    Vy chng ta t ra cu hi l lut kt hp no l thc s c gi tr? Chnghn ta c lut: m nhc, ngoi ng, th thao = > CD, ngha l nhng ngi muasch m nhc, ngoi ng, th thao th cng mua a CD. Lc ta quan tm n slng trng hp khch hng tho mn lut ny trong c s d liu hay h trcho lut ny. h tr cho lut chnh l phn trm s bn ghi c c sch m nhc,ngoi ng, th thao v a CD hay tt c nhng ngi thch c ba loi sch trn.

    Tuy nhin gi tr h tr l khng . C th c trng hp ta c mt nhmtng i nhng ngi c c ba loi sch trn nhng li c mt nhm vi lng ln

    hn nhng ngi thch sch th thao, m nhc, ngoi ng m khng thch mua a CD.Trong trng hp ny tnh kt hp rt yu mc d h tr tng i cao. Nh vychng ta cn thm mt o th hai l tin cy (Confidence). tin cy l phntrm cc bn ghi c a CD trong s cc bn ghi c sch m nhc, th thao, ngoi ng.

    Nhim v ca vic pht hin cc lut kt hp l phi tm tt c cc lutdng X => B sao cho tn s ca lut khng nh hn ngng Minsup cho trc v tin cy ca lut khng nh hn ngng Minconfi cho trc. T mt c s dliu ta c th tm c hng nghn v thm ch hng trm nghn cc lut kt hp.

    1.7.2.4. Mng NeuronMng Neuron l tip cn tnh ton mi lin quan ti vic pht trin cu

    trc ton hc v kh nng hc. Cc phng php l kt qu ca vic nghin cu mhnh hc ca h thng thn kinh con ngi.

    Mng Neuron c th a ra ngha t cc d liu phc tp hoc khngchnh xc v c th c s dng chit xut cc mu v pht hin ra cc xuhng qu phc tp m con ngi cng nh cc k thut my tnh khc khng th

    pht hin c. Khi cp n khai thc d liu, ngi ta thng cp nhiu n

    mng Neuron. Tuy mng Neuron c mt s hn ch gy kh khn trong vic pdng v pht trin nhng n cng c nhng u im ng k.

    23

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    24/67

    MhnhmngNeuron

    Muchit xutc

    Dliu

    Hnh 1.4.Th hin s khai ph d liu bng mng Neunon.

    Mt trong s nhng u im phi k n ca mng Neuron l kh nngto ra cc m hnh d on c chnh xc cao, c th p dng c cho rt nhiu

    loi bi ton khc nhau, p ng c nhim v t ra ca khai ph d liu nhphn lp, gom nhm, m hnh ha, d bo cc s kin ph thuc vo thi gian, v.v.

    1.7.2.5. Gii thut di truyn

    Gii thut di truyn, ni theo ngha rng l m phng li h thng tinha trong t nhin, chnh xc hn l gii thut ch ra tp cc c th c hnhthnh, c c lng v bin i nh th no? V d nh xc nh xem lm thno la chn cc c th to ging v la chn cc c th no s b loi b. Giithut cng m phng li yu t gen trong nhim sc th sinh hc trn my tnh

    c th gii quyt nhiu bi ton thc t khc nhau.Gii thut di truyn l mt gii thut ti u ha. N c s dng rt

    rng ri trong vic ti u ha cc k thut khai ph d liu trong c k thutmng Neuron. S lin h ca n vi cc qu trnh khai ph d liu. V d nh trongk thut cy quyt nh, to lut. Nh cp phn trc, cc lut m hnh had liu cha cc tham s c xc nh bi cc gii thut pht hin tri thc.

    Giai on ti u ha l cn thit xc nh xem cc gi tr tham s noto ra cc lut tt nht. V v vy m gii thut di truyn c s dng trong cc

    cng c khai ph d liu.1.8. ng dng ca khai ph d liu

    Khai ph d liu l mt lnh vc lin quan ti nhiu ngnh hc khc nh:H CSDL, thng k, trc quan ho.v.v. Hn na, tu vo cch tip cn c sdng, khai ph d liu cn c th p dng mt s k thut nh mng nron, lthuyt tp th, tp m, biu din tri thc, v.v.So vi cc phng php ny, khai phd liu c mt s u th r rt.

    So vi phng php hc my, khai ph d liu c li th hn ch, khai

    ph d liu c th s dng vi cc CSDL cha nhiu nhiu, d liu khng y

    24

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    25/67

    hoc bin i lin tc. Trong khi phng php hc my ch yu c p dngtrong cc CSDL y , t bin ng v tp d liu khng qua ln.

    Phng php h chuyn gia: Phng php ny khc vi khai ph d liu ch cc v d ca chuyn gia thng mc cao hn nhiu so vi cc d liu trong

    CSDL, v chng thng ch bao hm c cc trng hp quan trng. Hn na ccchuyn gia s xc nhn gi tr v tnh hu ch ca cc mu pht hin c.

    Phng php thng k l mt trong nhng nn tng l thuyt ca khaiph d liu, nhng khi so snh hai phng php vi nhau ta c th thy cc phngphp thng k cn tn ti mt s im yu m khai ph d liu khc phc c.

    Cc phng php thng k chun khng ph hp vi cc kiu d liu ccu trc trong rt nhiu CSDL.

    Cc phng php thng k hot ng hon ton theo d liu, n khng sdng tri thc c sn v lnh vc.

    Kt qu phn tch ca h thng s rt nhiu v kh c th lm r ra c.

    Phng php thng k cn c s hng dn ca ngi dng xc nhphn tch d liu nh th no v u.

    Vi nhng u im , khai ph d liu hin ang c p dng mtcch rng ri trong nhiu lnh vc kinh doanh v i sng khc nhau nh:Marketing, ti chnh, ngn hng v bo him, khoa hc, y t, an ninh,

    internet.v.v.rt nhiu t chc v cng ty ln trn th gii p dng k thut khaiph d liu vo cc hot ng sn xut kinh doanh ca mnh v thu c nhng lich to ln.

    Mt s ng dng ca khai ph d liu trong lnh vc kinh doanh:

    Brandaid: M hnh Marketing linh hot tp chung vo hng tiu dng.

    Callpla: Gip nhn vin bn hng xc nh s ln ving thm ca khchhng trin vng v khch hng hin c.

    Detailer: Xc nh khch hng no nn ving thm v sn phm no nn

    gii thiu trong tng chuyn ving thm.Geoline: M hnh thit k a bn tiu th v dch v.

    Mediac: Gip ngi qung co mua phng tin trong mt nm, lp khoch s dng phng tin bao gm phc ho khc th trng, c tnh tim nng.

    1.9. Mt s thch thc t ra cho vic khai ph d liu

    Cc c s d liu ln.

    Thay i d liu v tri thc c th lm cho cc mu pht hin khng cnph hp na.

    D liu b thiu hoc nhiu.

    25

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    26/67

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    27/67

    Chng II: TP PH BIN V LUT KT HP2.1. M u

    Hin nay cc cng ty, doanh nghip ang lu tr mt lng thng tin ln v

    bn hng. Mt bn ghi trong c s d liu ny cha cc thng tin v ngy mua bn,s lng hng bn,... T c s d liu bn hng, chng ta c th tm ra cc miquan h gia cc cp thuc tnh- gi tr thuc tnh. l lut kt hp tiu biu: Vd c 80% khch hng mua sch ngoi ng th s mua a CD hoc VCD .

    2.2. Cc khi nim c bn

    2.2.1. nh ngha 2. 2.1: Ng cnh khai ph d liu

    Cho tp O l tp hu hn khc rng cc giao tc v I l tp hu hn khcrng cc mt hng,R l mt quan h hai ngi gia O vIsao cho vi oO v i I,

    (o,i)R= > giao tc.o c cha mt hng i. Ng cnh khai ph d liu (di y sgi tt l NCKPDL) l b ba (O, I, R).

    2.2.2. nh ngha 2. 2. 2: Cc kt ni Galois

    Cho NCKPDL (O, I, R), xt hai kt ni Galois v c nh ngha nhsau:

    : P (I) P (O) v : P (O) P (I):

    Cho S I, (S) = {oO | iS, (o, i) R}

    Cho X O, (X) = {i I | oX, (o, i) R}Trong P (X) l tp cc tp con ca X.

    Cp hm (, ) c gi l kt ni Galois. Gi tr (S) biu din tp cc giaotc c chung tt c cc mt hng trong S. Gi tr (X) biu din tp mt hng ctrong tt c cc giao tc ca X.

    2.2.3. nh ngha 2.2.3: h tr (Support)

    2.2.3.1. h tr ca mt tp mc X trong c s d liu D l t s gia ccgiao tc T D c cha tp X l tng s giao tc trong D (hay l phn trm ca cc

    giao tc trong D c cha tp mc X), k hiu l Supp (X).

    Supp (X)={ }

    D

    TXDT :

    Ta c 0 Supp (X) vi mi tp X.

    Hay c th ni Support ch mc thng xuyn xy ra ca mu.

    2.2.3.2. h tr ca lutXY l t s ca s giao tc c cha X Y v sgiao tc trong c s d liu D, k hiu l Supp (XY).

    Supp (XY)=

    { }

    D

    TDT YX:

    27

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    28/67

    Nh vy h tr ca mt lut bng 50% ngha l c 50% s giao tc ccha tp mc X Y. h tr c ngha thng k ca lut kt hp.

    2.2.4. nh ngha 2 2.4: tin cy ( Confidence)

    2.2.4.1. Tnh cht 2. 2.4.1: H tr ca tp con.

    Gi s A,B I l tp cc tp mc vi A B th Supp (A) Supp (B).

    Tht vy, tnh cht ny c th suy ra trc tip t khi nim tp mc ph bin,v tt c cc giao tc h tr B th cng h tr A. Nh vy giao tc no cha tp mcB th cng cha tp mc A.

    2.2.4.2. Tnh cht 2.2.4.2

    Gi s A, B l hai tp mc, A, B I. Nu B l tp mc ph bin v A Bth A cng l tp mc ph bin.

    Tht vy, nu B l tp mc ph bin th Supp (B) Minsup, mi tp mc Al tp mc con ca tp mc B u l tp mc thng xuyn trong c s d liu D vSupp (A) Supp (B) (Theo tnh cht 2.3.1).

    2.2.4.3. Tnh cht 2.2.4.3

    Gi s A, B l hai tp mc A B v A l tp mc khng ph bin th B cngl tp mc khng ph bin.

    Tht vy, A l tp mc khng thng xuyn nn Supp (A) Minsup m A B nn Supp (A) Supp (B).

    Suy ra Supp (B)< Minsup vy B l tp mc khng ph bin.

    2.2.4.4. Tnh cht 2. 2.4.4

    Gi s X, Y, Z I l nhng tp mc, sao cho X Y = . Th:

    Conf (XY) Conf (X/ZY Z).

    Tht vy, t X Y ZYX v X/Z X ta c:

    ( )( )ZXS u pZYXS u p p

    \

    ( )( )XSupp

    YXSupp

    tin cy ca mt lut r = XY l t s (phn trm) ca s giao tc trong Dcha X Y vi s giao tc trong D c cha tp mc X. K hiu tin cy ca mtlut l Conf (r). Ta c 0 conf 1.

    Nhn xt: h tr v tin cy chnh l xc sut sau:

    28

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    29/67

    Supp (XY) = P (X Y).

    Conf (XY) = P (Y/X) = Supp (X Y)/Supp (X).

    Ta ni rng vi lut c tin cy 85% th c ngha l 85% cc giao tc ccha X th cng cha Y. tin cy ca mt lut l th hin mc tng quantrong d liu gia hai tp X v Y. tin cy l o mc tin cy ca mt lut.

    2.2.5. nh ngha 2.2.5: Tp mt hng ph bin

    Cho NCKPDL (O, I, R) v Minsup (0, 1] l ngng ph bin ti thiu.Cho S I, ph bin ca S k hiu l SP (S) l t s gia s cc giao tc c chaS v s lng giao tc trong O. Ni cch khc SP (S)= | (S)| / |O|.

    Cho S I, S l mt tp cc mt hng ph bin theo ngng Minsup nu vch nu SP (S) Minsup. Trong cc phn sau tp mt hng ph bin s c gi tt

    l tp ph bin. K hiu FS (O, I, R, Minsup) = {S

    P (I) | SP (S) Minsup).2.2.6. nh ngha 2.2.6: Lut kt hp

    Cho NCKPDL (O, I, R) v ngng Minsup (0, 1]. Vi mt S FS (O, I,R, Minsup), gi X v Y l cc tp con khc rng ca S sao cho S = X Y v X Y

    = . Lut kt hp X vi Y c dng XY phn nh kh nng khch hng mua tpmt hng Y khi mua tp mt hng X. ph bin ca lut kt hp XY vi S =XY l SP (S).

    tin cy ca lut kt hp XY c k hiu l CF (XY) v c tnh

    bng cng thc CF (XY) = SP (X Y)/SP (X)Nguyn l Apriori.

    Cho S FS (O, I, R, Minsup), nu T S th T FS (O, I, R, Minsup).

    Cho T FS (O, I, R, Minsup), nu T S th S FS (O, I, R, Minsup).

    2.2.6.1. Tnh cht 2.2.6.1: Lut kt hp khng c hp thnh.

    Nu XY v Y Z tho mn trn D th khng nht thit X Y Z l ng.

    Tht vy, nu xt trng hp X Y= v cc giao dch trn D h tr Z khi

    v ch khi chng h tr X hoc h tr Y. Khi Supp (X Y) = 0 v Conf (X Y) = 0.

    Tng t, trng hp c X Y v X Z, ta suy ra X Y Z.

    2.2.6.2. Tnh cht 2.2.6.2: Lut kt hp khng c tnh tch.

    Nu X Y Z th X Z v Y Z cha chc xy ra.

    Chng hn xt trng hp Z c mt trong giao tc ch khi c tp X v Y cngc mt, tc l Supp (X Y) = Supp (Z). Nu h tr X, Y ln hn

    Supp (X Y) tc l Supp (X) Supp (X Y) v Supp (Y) Supp (X Y ) th

    hai lut ring bit s khng h tr.

    29

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    30/67

    Tuy nhin trng hp ngc li X Y Z th suy ra c X Y v X Z.

    gii thch cho tnh cht ny ta phn tch v d sau:

    Hnh 2.5. Minh ha lut kt hp khng c tnh tch

    Khi Z th hin trong mt giao dch ch nu c X v Y u th hin giao dch, ngha l Supp (X Y) = Supp (Z). Nu Supp cho X v Y ln hn Supp (X Y), th hai lut trn s khng c Conf yu cu. Nhng nu X Y Z tha mn

    trn D th c th suy ra X Y v X Z cng tha mn trn D v Supp (XY) Supp (XYZ) v Supp (XZ) Supp (XYZ).

    2.2.6.3. Tnh cht 2.2.6.3: Lut kt hp khng c tnh bc cu.

    Nu XY v Y Z tho mn trn D th khng th khng nh X Ztho mn trn D.

    Gi s T (X) T (Y) T (Z) v Conf (X Y) = Conf (Y Z) = Minconf.

    Khi Conf (X Z) = Minconf2 < Minconf (vi 0 < Minconf < 1), suy ra lut X Z khng c Conf ti thiu, tc l X Z khng tho mn trn D.

    2.2.6.4. Tnh cht 2.2.6.4

    Nu lut X (L-X) khng tho mn tin cy cc tiu th lut Y (L-Y) cng khng tho mn vi cc tp mc Y X L.

    2.3. Tm tp ph bin

    2.3.1. Mt s khi nim

    Cho NCKPDL (O, I, R) v Minsup (0, 1], tm FS (O, I, R, Minsup).Thut ton c xy dng da trn nguyn l Apriori. u tin thut ton s tm

    cc tp ph bin c mt phn t. Sau cc ng vin ca cc tp ph bin c haiphn t s c to lp bng cch hp cc tp ph bin c mt phn t. Mt cch

    T(Y)

    T(Z)

    T(X)

    30

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    31/67

    tng qut, cc tp ng vin ca tp ph bin c k phn t s c to t cc tp phbin c k-1 phn t. Gi Fk = {S P (I) | SP (S) Minsup v |S|= k}. Thut ton sduyt tng ng vin to Fk bao gm cc ng vin c ph bin ln hn hoc

    bng ngng Minsup.

    - Tp cc hng mc (Itemset)I= {i1, i2, , im}:

    VD :I= {sa, bnh m, ng cc, sa chua}

    Tp k hng mc (k-Itemset).

    - Giao dch t : tp cc hng mc sao cho t I

    VD : t = {bnh m, sa chua, ng cc}

    - CSDL D = {t1, t2, , tn}, ti= {ii1, ii2, , iik} vi iij I : CSDL giao dch

    - Giao dch t cha X nu X l tp cc hng mc trong I v X t

    VD : X = {bnh m, sa chua}

    - ph bin (supp) ca tp cc hng mc X trong CSDL D l t l gia scc giao dch cha X trn tng s cc giao dch trong D.

    Supp (X) = count (X) / | D |

    Tp cc hng mc ph bin S hay tp ph bin (Frequent Itemset) l tpcc hng mc c ph bin tha mn ph bin ti thiu

    Nu Supp (S) Minsup th S - tp ph bin.

    - Tnh cht ca tp ph bin (Apriori).Tt c cc tp con ca tp ph bin u l tp ph bin.

    2.3.2. Thut ton Apriori

    2.3.2.1. M t thut ton

    u tin thc hin duyt CSDL tm cc mc ring bit trong CSDL v h tr tng ng ca n. Tp thu c l C1. Duyt tp C1 loi b cc mc c h tr< Minsup, cc tp mc cn li ca C1 l cc tp 1-Itemset (L1) ph bin. Sau kt niL1 vi L1 c tp cc tp 2-Itemset C2. Duyt CSDL xc nh h tr ca cc tp

    mc trong C2. Duyt C2 Loi b cc tp mc c h tr < Minsup, cc tp mc cnli ca C2 l tp cc tp 2-Itemset (L2) ph bin. L2 li c s dng sinh ra L3 v c

    tip tc nh vy cho n khi tm c tp mc k-Itemset L km Lk= (tc l khng ctp mc ph bin no c tm thy) th dng li.

    Tp cc tp mc ph bin ca CSDL l: ki-1= L1.

    tng hiu qu ca thut ton trong qu trnh sinh cc tp mc ng c, tas dng tnh cht ca tp mc ph bin lm gim s lng tp cc tp ng c,khng phi l tp ph bin c sinh ra. Tnh cht l: Tp cc tp con khc rngca tp mc ph bin u l tp mc ph bin.

    31

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    32/67

    Bc ni:

    Input: Tp Lk+1

    l tp (k+1)-Itemset ph bin.

    Output: Tp Ck l tp cc ng c vin cho tp mc ph bin k-Itemset

    Tp cc ng c k-Itemset c sinh ra t vic kt ni Lk-1 vi chnh n. Gis l1, l2 l cc tp mc ca Lk-1. Ta k hiu lj[i] l mc th Itemset trong tp mclj,vic kt ni Lk-1 vi Lk+1 c thc hin nh sau: Cc tp mc ca Lk-1 c kt nivi nhau nu mc u ca chng trng nhau v l1[k-1]

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    33/67

    2.3.2.2. V d minh ho cho thut ton Apriori

    Xt CSDL giao dch D c cho trong bng sau:

    Bng 2.1. CSDL s dng minh ho thut ton Apriori

    TID Danh sch cc mc

    1 I1 I2 I5

    2 I2 I4

    3 I2 I3

    4 I1 I2 I4

    5 I1 I3

    6 I2 I3

    7 I1 I3

    8 I1 I2 I3 I5

    9 I1 I2 I3

    Trong ln lp u tin ca thut ton, mi mc l mt thnh vin ca tp ngc C1. Thut ton thc hin qut tt c cc giao dch ca D theo m s s ln

    xut hin ca mi mc.Gi s h tr cc tiu m s giao dch l 2 (tc l Minsup = 2/9*100% =

    22%). Khi tp mc ph bin 1-Itemset (L1), c xc nh nh sau: L1 bao gmtt c cc ng c 1-Itemset tho mn h tr ti thiu.

    Tm ra cc tp mc ph bin 2-Itemset (L2), thut ton s dng kt ni L1 viL1 sinh ra tp ng c 2-Itemset (C2). C2 bao gm t hp chp lj[i] ca cc phn tc trong L1 do s lng cc phn t ca C2 c tnh nh sau:

    |C2| = C2

    |1|L = C2

    5 =!3!2

    !5= 10

    Tip theo, qut cc giao dch trong D v tnh h tr ca cc tp ng ctrong C2.

    Tp mc ph bin 2-Itemset L2 c xc nh, bao gm cc tp mc 2-Itemset l ng c trong C2 c h tr ln hn hoc bng h tr ti thiuMinsup.

    Sinh cc tp ng c 3-Itemset, CLk-1bng cch, kt ni L2 vi chnh n tanhn c kt qu C3 l:

    C3 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}

    33

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    34/67

    S dng tnh cht Apriori ta bt cc ng c: Tt c cc tp con ca tpph bin l tp ph bin. Do 4 ng c vin ca tp C3 khng th l tp ph binv n cha cc tp khng ph bin, ta thc hin ta (loi) bn tp ng c vin khi C3. C th nh sau:

    + Cc tp {I1, I3, I5}, {I2, I3, I5} khng l ph bin v tp con {I3, I5} ca nkhng ph bin (khng c trong L2).

    + Tp {I2, I3, I4} khng l ph bin v tp con {I3,I4} ca n khng ph bin(khng c trong L2).

    + Tp {I2, I4, I5} khng l ph bin v tp con {I4, I5} ca n khng ph bin(khng c trong L2).

    Vic ta bt cc tp ng c ny s lm gim bt vic phi qut CSDL tnh h tr khi xc nh L3. Lu rng, vi ng c k-Itemset, chng ta ch cn kimtra tp con (k-1)-Itemset c l ph bin hay khng? V thut ton Apriori s dngchin lc tm kim theo chiu rng.

    Nh vy sau khi thc hin kt ni v ta ta thu c kt tp C3 l:

    C3 = {{I1, I2, I3}, {I1, I2, I5}}

    Qut cc giao dch trong CSDL xc nh L 3, L3bao gm cc ng c 3-Itemset trong C3 tho mn h tr ti thiu. Ta c L3 l:

    L3 = {{I1, I2, I3}, {I1, I2, I5}}

    Sinh cc tp ng c 3-Itemset, C3bng cch kt ni L3 vi chnh n ta nhnc kt qu C4 l tp mc {I1, I2, I3, I5}. Sau thc hin bc ta th tp {I1, I2, I3,I5} b ta v n cha tp con {I 2, I3, I5} khng l tp ph bin (khng c trong L3).

    Nh vy ta c C4 = n y thut ton kt thc. Vy tp hp tt c cc tp mcph bin c tm.

    Cc tp mc ph bin tm c t CSDL giao dch D vi h tr ti thiuMinsup = 22% ( h tr ti thiu tng ng vi s giao dch = 2).

    Bng 2. 2. Kt qu thc hin thut ton Aprori cho CSDL D

    Loai tp mc ph bin Cc tp mc ph bin

    1-Itemset {I1} {I2} {I3} {I4} {I5}

    2-Itemset {I1, I2} {I1, I3} {I1, I5} {I2, I3} {I2, I4} {I2, I5}

    3-Itemset {I1, I2, I3} {I1, I2, I5}

    2.3.2.3. Procedure-Code.

    Input : CSDLD, Minsup

    Output :L : cc tp ph bin trongD

    Ck : Tp ng vin kch thc k

    34

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    35/67

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    36/67

    2.4. Tm lut kt hp

    Gi I = {I1, I2,...., Im} l tp m thuc tnh ring bit, mi thuc tnh gi l mtmc. Gi D l mt c s d liu, trong mi bn ghi T l mt giao dch v cha

    cc tp mc, T I.

    nh ngha 2.4 1: Mt lut kt hp l mt quan h c dng X Y, trong

    X, Y I l cc tp mc gi l Itemsets, v =YX . y, X c gi l tin ,Y l mnh kt qu.

    Thng s quan trng nht ca lut kt hp l h tr (s) v tin cy (c).

    nh ngha 2.4.2: h tr (Support) ca lut kt hp XYl t l phntrm cc bn ghi YX vi tng s cc giao dch c trong c s d liu.

    nh ngha 2.4.3: i vi mt s giao dch c a ra, tin cy

    (confidence) l t l ca s giao dch c cha YX vi s giao dch c cha X.n v tnh %.

    Vic khai thc cc lut kt hp t c s d liu chnh l vic tm tt c cclut c h tr v tin cy ln hn ngng ca h tr v tin cy do ngis dng xc nh trc. Cc ngng ca h tr v tin cy c k hiu lMinsup v Mincof.

    Vic khai thc cc lut kt hp c phn tch thnh hai vn sau y:

    - Tm tt c cc tp mc thng xuyn xy ra m c h tr ln hn hoc

    bng Minsup.- To ra cc lut mong mun s dng cc tp mc ln m c tin cy lnhn hoc bng Mincof.

    2.4.1. Pht biu bi ton khai ph lut kt hp

    I = {i1, i2, , in } l tp bao gm n mc (Item cn gi l cc thuc tnh -attribute).X I c gi l tp mc (Itemset).

    T = {t1, t2, .v.v.tm} l tp gm m giao dch (Transasction cn gi l bn ghi- Record), mi giao dch c nh danh bi TID (Transaction Identification).

    R l mt quan h nh phn trn I v T. Nu giao dch t c cha mc I th tavit (i, t) R.(T, I, R) l ng cnh khai thc d liu. Mt CSDL D, v mt hnhthc, chnh l mt quan h nh phn R nh trn.

    V ngha, mt CSDL l mt tp cc giao dch, mi giao dch t l mt tpmc, t 2I (2I l tp cc tp con ca I).

    V d v CSDL giao dch: I = {A, B, C, D, E}, T = {1, 2, 3, 4, 5, 6}

    Thng tin v cc giao dch cho bng sau:

    36

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    37/67

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    38/67

    Ngng Minconf phn nh mc xut hin ca Y khi cho trc X.( ( c Minconf) (Minimum Confidence))

    Lut kt hp cn tm l lut kt hp tho mn Minsup v Minconf cho trc.Chng ta ch quan tm n cc lut c h tr ln hn h tr ti thiu v tin

    cy ln hn tin cy ti thiu.

    Hu ht cc thut ton khai ph lut kt hp thng chia thnh 2 pha:

    Pha 1 : Tm tt c cc tp mc ph bin t c s d liu tc l tm tt c cctp mc X tho s (X) Minsup.

    Pha 2: Sinh cc lut tin cy t cc tp ph bin tm thy pha 1.

    Nu X l mt tp mc ph bin th lut kt hp c sinh ra t X c dng:

    X c X \ X, trong :

    X l tp con khc rng ca X.X\X l hiu ca hai tp hp X v X.

    c l tin cy ca lut tho mn c Minconf

    Vi tp mc ph bin trong bng 4 th chng ta c th sinh lut kt hp sau y:

    Bng 2.5. Lut kt hp sinh t tp mc ph bin ABE

    Lut kt hp tin cy c Minconf ?

    A %1 0 0 BE C

    B %6 7

    AE KhngE %8 0 AB C

    AB %1 0 0 E C

    AE %1 0 0 B C

    BE %8 0 A C

    Tp ph bin ti i: Cho M FX (T, I, R, Minsup) M c gi l tp mcph bin ti i nu khng tn ti X FX (T, I, R, Minsup), M X, M X

    2.4.2. Pht trin gii php hiu qu trong khai thc lut kt hpBi ton lut kt hp.

    Cho mt tp cc gi tr I, mt CSDL giao dch D, ngng h tr ti thiu

    Minsup, ngng tin cy Mincof, tm cc lut kt hp dng XY trn D tho

    mn iu kin Support (XY) >= Minsup v Confidence (X Y) >= Mincof.

    Tin trnh khai thc lut kt hp.

    Xc nh cc tp mc ln Vic xc nh cc tp mc ln gm c hai bcchnh sau y:

    Xc nh cc tp ng c vin (Ck).

    38

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    39/67

    Xc nh cc tp mc ln (L) da vo tp ng c vin

    xc nh tp ng c vin, ta thc hin cc bc sau y:

    1. Tm cc tp ng c vin mt mc.

    2. Qut CSDL D xc nh h tr ca cc tp ng c vin. Trong vngu tin, cc tp ng c vin cng chnh l tt c cc mc c trong CSDL. Ti vngth k (k>1), cc tp ng c vin c xc nh da vo cc tp mc ln xc nhti vng k 1, s dng hm Apriori-Gen () Sau khi xc nh c cc tp ng cvin, thut ton qut tng giao dch trong CSDL tnh h tr ca cc tp ngc vin. Qu trnh xc nh cc tp mc s kt thc khi khng xc nh c thmtp mc ln no na.

    3. Ni dung hm Apriori-gen (). Hm Apriori-gen () thc hin hai bc.

    1.Bc u tin, Lk 1 c kt ni vi chnh n thu c Ck.2.Bc th hai, Apriori_gen () xo tt c cc tp mc t kt qu kt ni mc mt s tp con (k 1) khng c trong L k 1. Sau n tr v tp mc ln kchthc k cn li.

    3. Sinh cc lut kt hp t tp mc ln.

    Vic pht hin cc tp mc ln l rt tn km v mt tnh ton. Tuy nhin,

    ngay khi tm c tt c cc tp mc ln (l L), ta c th d dng sinh ra cc lutkt hp c th c bng cc bc nh sau:

    1. Tm tt c cc tp con khng rng x, ca tp mc ln l L.2. Vi mi tp con x tm c, ta xut ra lut dng x (l - x) nu t l

    Support (l)/Support (x) >= Mincof (%).

    3. Th tc sinh ra cc tp con.

    4. u vo:

    5. Tp mc ln Lk

    u ra:

    Tp lut tho mn iu kin tin cy >= Mincof v h tr >= MinsupPhng php:

    Forall Lk, k >= 2 do

    Call Genrules (Lk, Lk);

    Procedure Genrules (Lk: Large k-Itemset, am: Large m-Itemset)

    A= { (m-1)-Itemset am-1| am-1 am}

    Forall am-1A do begin

    Conf = Support (Lk)/Support (am-1)If (Conf >= Mincof) then begin

    39

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    40/67

    Output the rule am-1(Lk am-1)

    vi Confidence = Mincof and Support = Support (Lk)

    If (m-1 >1) then Call Genrules (Lk, am-1);

    End;End;

    4. Gii php hiu qu

    Trong cc phn trn, trnh by tin trnh c bn khai thc cc lut kt hptrong CSDL, song vn cn phi quan tm nghin cu l tng hiu qu ca thut tontrong trng hp: S lng tp ng c vin c tm thy l rt ln.

    Trong phm vi nghin cu ca bi ny, s a ra mt gii php mi giiquyt vn nu.

    Ta cc ng c vin: Vic ta cc ng c vin nhm mc ch b i cc tp ngc vin khng cn thit, rt gn s lng ca tp cc tp ng c vin. Sau y, s trnh

    by k thut ta cc ng c vin khng cn thit.

    K thut ny c tinh cht: Cc mc trong tp ng c vin c sp xp theo tht.

    Ni dung k thut:

    Forall Itesets c Ckdo

    Forall (k 1) subsets s of c doIf (s Lk 1) then

    Delete c from Ck

    Da vo y, ta c th ta c cc tp ng c vin, t c th gii hn mintm kim ca n trn tt c cc tp mc.

    2.5. Quy trnh khai thc lut kt hp

    B1 : Tm tt c cc tp ph bin (theo ngng Minsup).

    B2 : To ra cc lut t cc tp ph bin.

    i vi mi tp ph bin S, to ra tt c cc tp con khc rng ca S.

    i vi mi tp con khc rng A ca S.

    Lut A (S - A) l LKH cn tm nu:

    conf (A(S - A)) = supp (S) / supp (A) Minconf

    2.6. Mt s thut ton khc

    2.6.1. Thut ton khai ph song song cho lut kt hp m

    Theo bi ton khai ph lut kt hp m tun t trong phn trn, mi thuctnh iu trong I c gn vi tp cc tp m Fi u nh sau:

    40

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    41/67

    Fi u = {f1

    iu , f2

    iu , ,fk

    iu }

    t tp FN = {k1} {k2} .v.v. {kn} = {s1, s2, sv} (v n v c th tnti nhng cp ki v kj ging nhau) v N l s lng BXL trong h thng, bi ton

    phn chia tp thuc tnh m cho cc BXL nh sau:

    Tm mt tp con Fn khc rng ca tp FN sao cho tch cc phn t trong Fnbng s lng BXL trong h thng. Trong trng hp khng tm thy nghim ngth thut ton s tr v mt nghim chp nhn c tc l tch ca cc phn ttrong Fn l xp x di ca N.

    Gi s s = {k1, k2, km} l mt nghim ca thut ton phn chia (ngha l k1*k2* *km = N). Lc , s lng thuc tnh m gim c ti cc BXL so vi thut tontun t l (k1 1 ) + (k2 1 ) + + (km 1 ) = (k1, + k2,, km m ). Nghim ti u lnghim c gi tr ca biu thc (k1 + k2 + + km m ) t cc i, tc l s thuc tnhgim c cng nhiu cng tt. d tm kim nghim ti u, tp FN trc tin phic sp xp gim dn. y l chin lc rt quan trng bi ta bit thi gian x l sgim theo hm m khi gim dn s lng thuc tnh m. Mt chin lc khc tmnghim ti u l trong sut qu trnh tm kim thut ton chia phi tham chiu n h tr ca cc thuc tnh m ( c cp nht sau khi thc hin hm Counting) xtxem chng ta nn phn chia theo thuc tnh no. Thuc tnh c phn chia phi c h tr ca cc thuc tnh m tng i cn bng. Chin lc ny gip cn bng tagia cc BXL trong h thng cc bc tip theo. Bi ton ny c th gii quyt bng

    chin lc quay lui (c quy hoc khng). Bng di y c trnh by theo cchvit khng quy.

    Thut ton

    Boolean Subset (FN,N,Idx)

    k = 1;

    Idx[1] = 0;

    S = 0;

    While (k > 0) {Idx[k] ++;

    If (Idx[k]

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    42/67

    k + +;

    }

    }

    } else {

    K--;

    S= FN[Idx[k]];

    }

    }

    Return False;

    FindSubset (FN, N, Idx, Fn)

    for (n = N; n > 0; n --)

    If (Subset (FN, n, Idx)) {

    Fn = {FN[i] | i Idx}

    Return;

    1 }

    2.6.2. Thut ton FP-Growth

    2.6.2.1 Bn cht.

    - Khai thc tp ph bin khng s dng hm to ng vin.

    - Nn CSDL thnh cu trc cy FP (Frequent Patern)

    - Duyt qui cy FP to tp ph bin

    2.6.2.2. Qui trnh.

    B1 : Thit lp cy FP

    B2 : Thit lp c s mu iu kin (Conditional Pattern Bases) cho mi hngmc ph bin (mi nt trn cy FP).

    B3 : Thit lp cy FP iu kin (Conditional FP tree) t mi c s mu iu

    kinB4 : Khai thc qui cy FP iu kin v pht trin mu ph bin cho n

    khi cy FP iu kin ch cha 1 ng dn duy nht - to ra tt c cc t hp camu ph bin

    a- Thit lp cy FP (B1)

    TID Items bought (ordered) Frequent items

    100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}

    200 {a, b, c, f, l, m, o} {f, c, a, b, m}

    300 {b, f, h, j, o, w} {f, b}

    42

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    43/67

    400 {b, c, k, s, p} {c, b, p}

    500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} Minsupp = 60%

    - Tm tp ph bin 1- hng mc (duyt CSDL 1 ln)

    - Sp xp tp ph bin gim dn vo trong F-List

    F-List = f-c-a-b-m-p

    - Duyt CSDL ln na v thit lp cy FP

    Bng 2.6. Cy FP

    TIDItems bought

    (ordered) frequentitems

    100{f, a, c, d, g, i,

    m, p} {f, c, a,m, p}

    200 {a, b, c, f, l, m, o} {f, c, a, b, m}

    300 {b, f, h, j, o, w} {f, b}

    400 {b, c, k, s, p} {c, b, p}

    500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} Minsupp = 3

    - Tm tp ph bin 1- hng mc (duyt CSDL 1 ln)

    - Sp xp tp ph bin gim dn vo trong F-List

    F-List = f-c-a-b-m-p

    - Duyt CSDL ln na v thit lp cy FP

    Bng 2.7. Cy FP

    Header Table

    Item frequency head

    f 4

    c 4

    a 3

    b 3

    m 3

    p 3

    43

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    44/67

    TID Items bought (ordered) frequent items

    100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}

    200 {a, b, c, f, l, m, o} {f, c, a, b, m}

    300 {b, f, h, j, o, w} {f, b}

    400 {b, c, k, s, p} {c, b, p}

    500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} Minsupp = 3

    - Tm tp ph bin 1- hng mc (duyt CSDL 1 ln)

    - Sp xp tp ph bin gim dn vo trong F-List

    F-List = f-c-a-b-m-p

    - Duyt CSDL ln na v thit lp cy FP

    Bng 2.8. Cy FP

    Header TableItem frequency head

    f 4c 4a 3b 3m 3

    p 3

    {}

    f:1

    c:1

    a:1

    m:1

    p:1

    44

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    45/67

    TID Items bought (ordered) frequent items

    100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}

    200 {a, b, c, f, l, m, o} {f, c, a, b, m}

    300 {b, f, h, j, o, w} {f, b}

    400 {b, c, k, s, p} c, b, p}

    500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} Minsupp = 3

    - Tm tp ph bin 1- hng mc (duyt CSDL 1 ln)

    - Sp xp tp ph bin gim dn vo trong F-List

    F-List = f-c-a-b-m-p

    - Duyt CSDL ln na v thit lp cy FP

    Bng 2.9. Cy FP

    m:1

    Header TableItem frequency head

    f 4c 4a 3b 3m 3

    p 3

    {}

    f:2

    c:2

    a:2

    b:1

    p:1 m:1

    45

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    46/67

    TID Items bought (ordered) frequent items

    100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}

    200 {a, b, c, f, l, m, o} {f, c, a, b, m}

    300 {b, f, h, j, o, w} {f, b}

    400 {b, c, k, s, p} {c, b, p}

    500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} Minsupp = 3

    - Tm tp ph bin 1- hng mc (duyt CSDL 1 ln)

    - Sp xp tp ph bin gim dn vo trong F-List

    F-List = f-c-a-b-m-p

    - Duyt CSDL ln na v thit lp cy FP

    Bng 2.10. Cy FP

    Header Table

    Item frequency headf 4c 4a 3b 3m 3

    p 3

    {}

    f:3

    c:2

    a:2

    b:1m:1

    p:1 m:1

    b:1

    46

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    47/67

    b-Thut ton FP- Growth (B2)

    - Bt u t mu ph bin cui bng ca cy FP- Duyt cy FP theo kt ni ca mi hng mc ph bin p

    p:1

    m:1

    {}

    f:4 c:1Header TableItem frequency headf 4c 4a 3b 3m 3

    p 3

    b:1b:1c:3

    a:3

    b:1m:2

    p:2

    V d cy FP

    A:9

    B:3

    C:2

    E:1

    C:2

    E:2

    B:5

    E:1

    D:1

    D:1

    C:2

    D:1

    Nu

    ll

    E:1

    A 9B 8C 6E 5D 3

    A BB A CA B DE B AA CA B CB CB C DB E

    E AA C EA D E

    Minsupp = 25%

    Nu Minsupp = 40% th cy FP s nh th no ?

    47

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    48/67

    - Gom tt c ng dn tin t bin i (Transformed Prefix ) ca hng mc p to c s mu iu kin ca p

    Bng 2.11. Cy FP

    Conditionalpattern bases

    item cond. Pattern base

    c f:3

    a fc:3

    b fca:1, f:1, c:1

    m fca:2, fcab:1

    p fcam:2, cb:1

    c- Thut ton FP- Growth (B3)

    Vi mi c s mu :

    - m s lng mi mu trong c s mu

    - Thit lp cy FP cho tp ph bin ca mu c s

    VD : Gi s c c mu iu kin cho p: {fcam:2, cb:1}

    p-Conditional FP-tree

    Bng 2.12. Cy FP

    Tt c mu ph bin linquan n p l :

    p,cp

    Vi mi c s mu :- m s lng mi mu trong c s

    - Thit lp cy FP cho tp ph bin ca mu c s

    V d : m-Conditional Pattern Base: fca:2, fcab:1

    48

    {}

    Header TableItem frequency head

    c 3c:3

    Header TableItem frequency head

    f 4c 4a 3

    b 3m 3

    p 3

    {}

    f:4 c:1

    b:1

    p:1

    c:3

    a:3

    b:1m:2

    p:2 m:1

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    49/67

    {}

    f:3

    c:3

    a:3

    m-Conditional FP-Tree

    Bng 2.13. Cy FP

    Tt c mu ph bin lin quann m l :

    m,fm, cm, am,

    fcm, fam, cam,

    fcam

    b:1

    p:1

    Header TableItem frequency headf 4c 4a 3b 3m 3

    p 3

    {}

    f:4 c:1

    b:1c:3

    a:3

    b:1m:2

    p:2 m:1

    49

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    50/67

    V d

    Bng 2.14.C s d liu

    d- Thut ton FP- Growth (B4)

    - Gi s cy FP T c mt ng dn n (Single Path) P

    - Tp mu ph bin cui cng ca T sinh ra bng cch lit k tt c cc t hp caSub-Paths thuc P

    { }{ }f

    { (f:3) } | c{ (f:3) }c

    { (f:3, c:3) } | a{ (fc:3) }a

    { }{ (fca:1), (f:1), (c:1) }b

    { (f:3, c:3, a:3) } | m{ (fca:2), (fcab:1) }m

    { (c:3) } | p{ (fcam:2), (cb:1) }p

    Conditional FP-treeConditional pattern-baseItem

    Cond. pattern base of am: (fc:3) {}

    f:3

    c:3

    am-conditionalFP-tree

    Cond. pattern base of cm: (f:3)

    cm-conditionalFP-tree

    {}

    f:3

    cam-conditionalFP-tree

    Cond. pattern base of cam: (f:3)

    {}

    f:3

    c:3

    a:3m-conditionalFP-tree

    50

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    51/67

    2.6.2.3. Thut ton FP_Growth

    Pocedure FP_Growth (Tree, )

    If cy FP cha 1 path P then

    For mi t hp ca nt trn P

    To mu vi Supp = Suppmin (cc nt trong );

    Else for mi i trn header ca cy

    To mu = i vi supp = i . Supp ;

    Thit lp s Conditional Pattern base and s Conditional FP-Tree Tree

    If Tree , gi FP_Growth (Tree , ).* Kt lut chng II:

    Qua chng II chng ta bit c vic p dng cc thut ton vo cc lnh vc cai sng x hi, n c vai tr rt quan trng trong vic xy dng nhng h h tr raquyt nh. Khai ph lut kt hp l mt hng i ang c hon thin. c thp dng lut kt hp trc tin ta phi tin hnh m ho c s d liu hin c, yl mt bc quan trng, quyt nh c th sinh lut kt hp tt hay khng.

    Thut ton Apriori tm tp mc ph bin theo hng sinh ng c.

    Thut ton FP_Growth tm tp mc ph bin theo hng khng sinh ng c.

    Trn c s l tp ph bin tm c ta p dng thut ton khai ph lut kt hp sinh ra tp lut kt hp ng tin.

    {}

    f:3

    c:3

    a:3

    m-Conditional FP-Tree

    Tt c mu ph bin linquan n mm,fm, cm, am,fcm, fam, cam,fcam

    51

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    52/67

    Chng III: CI T V TH NGHIM THUT

    TON TM TP PH BIN V LUT KT HP3.1. Pht biu bi ton.

    Vi s pht trin ca nn kinh t hin nay, th vic kinh doanh ang l vn c rt nhiu ngi quan tm. X hi cng pht trin th trnh con ngi ngycng c nng cao. V vy pht trin gio dc ang l vn m x hi rt quantm, ln vic kinh doanh cc ti liu v sch gio khoa, sch tham kho, dnghc tp,..ang l mt hng i ng. Nhng kinh doanh tt th ngi kinh doanh

    phi bit qun l n nh th no cho ng v hp l nht.

    T nhng iu thit ngha phi c mt phn mm qun l bn sch, htr cho ngi qun l trong vic la chn cc u sch bn. V d khi bn schgio khoa th bn km thm sch tham kho v dng hc tp g? Chng c linquan ti nhau nh th no?

    Lut kt hp cho ta bit vic la chn cc loi sch g bn, gip ngiqun l a ra quyt nh nhanh, chnh xc v hiu qu nht.

    3.2. La chn thut ton ci t phn mm.

    C rt nhiu thut ton a ra vic la chn cc u sch trong vic qunl bn sch, nhng chng em la chn thut ton Apriori ci t.

    Mc ch ca thut ton ny l a ra cc lut kt hp trong vic la chncc u sch bn. V d khi bn sch Ton th bn km thm sch L, Ho.

    3.3. Yu cu khi ci t thut ton.

    - V my tnh:

    + Cu hnh ti thiu Ram 256.

    + cng 2G cn trng.

    + CPU P4 1.7Ghz

    - V phn mm:

    + Ci t Visual Studio 2005

    + DOT.NET 2.0.

    52

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    53/67

    3.4. C s d liu.

    3.4.1. Giao din chnh ca c s d liu.

    Hnh 3.1. Giao din chnh ca c s d liu

    M t mt s chc nng trong giao din:

    + H Thng: C chc nng thot khi chng trnh.

    + DM khch hng: C chc nng thm, lu, sa, xa d liu cho khch hng.

    + DM hng: : C chc nng thm, lu, sa, xa d liu cho hng ha.

    + DM ha n: : C chc nng thm, lu, sa, xa d liu cho ha n.

    + DM Nh CC: : C chc nng thm, lu, sa, xa d liu cho nh cung cp.

    + Apriori: C chc nng ghi d liu vo file XML.

    53

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    54/67

    3.4.2. Bng danh mc cc Nh cung cp hng ha.

    Cu trc v d liu ca bng nh sau:

    Hnh 3.2. Danh mc nh cung cp

    Mt s thuc tnh ca bng l:

    + MaNCC: M nh cung cp hng ha.

    + TenNCC: Tn nh cung cp hng ha.

    + DiaChi: a ch ca nh cung cp hng ha.

    + DienThoai: in thoi ca nh cung cp.

    + MaSoThue: M s thu nh cung cp hang ha.+ Email: Email cua nh cung cp

    54

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    55/67

    3.4.3. Bng danh mc cc Hng Ho.

    Cu trc v d liu ca bng hng ho nh sau:

    Hnh 3.3. Danh mc hng ha

    Mt s thuc tnh ca bng l:

    + MaH: M hng ho.

    + MaNCC: M nh cung cp hng ho.

    + TenHang: Tn hng ho.

    + MoTa: M t hng ha.

    + ChungLoai: Chng loi hng ha.

    55

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    56/67

    3.4.4. Bng danh mc cc Khch Hng.

    Cu trc v d liu bng khch hng nh sau:

    Hinh 3.4.Danh mc khch hng

    Mt s thuc tnh ca bng l:

    + MaKH: M khch hng.

    + TenKH: Tn khch hng.

    + SoCMND: S chng minh nhn dn.

    + DiaChi: a ch khch hng.

    + DienThoai: in thoi khch hng.

    + Email: Email ca khch hng

    56

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    57/67

    3.4.5. Bng danh mc cc Ho n.

    Cu trc v d liu ca bng ha m nh sau:

    Hnh 3.5. Danh mc ha n

    Mt s thuc tnh ca bng l:

    + MaHD: M ho n.

    + MaKH: M khch hng.

    + NgayHD: Ngy nhp ho n.

    + Ghichu: Ghi ch ha n.

    57

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    58/67

    3.4.6. Bng danh mc chi tit Ho n.

    Cu trc v d liu ca bng chi tit ha m nh sau:

    Hnh 3.6. Danh mc chi tit ha n

    Mt s thuc tnh ca bng l:

    + MaHD: M ho n.

    + MaH: M hng ha.

    + SoLuong: S lng hng ha.

    58

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    59/67

    3.4.7. Ghi XML.

    Hnh 3.7. Ghi XML

    3.5. Giao din chnh chng trnh.

    Hnh 3.8. Giao din chnh ca chng trnh

    59

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    60/67

    3.6. Kt ni d liu.

    Hnh 3.9. Kt ni d liu

    3.7. Thm d liu Xml

    Hnh 3.10. Thm d liu XML

    60

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    61/67

    3.8. Kt qu phn tch

    Hnh 3.11. Kt qu phn tch

    3.9. Kt qu lc MinSup = 10

    Hnh 3.12. Kt qu lc ph bin ti thiu

    61

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    62/67

    3.10. Kt qu lc MinCon = 40%

    Hnh 3.13. Kt qu lc tin cy

    * Kt lun chung III:

    Ci t bng thut ton Apriori p dng trong qun l bn hng ti thsiu. Da vo kt qu ny m ngi qun l bit c nhng nhm mt hng nolin quan ti nhau, phc v cho mc ch qun l v la chn cc mt hng kinhdoanh.

    62

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    63/67

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    64/67

    HNG PHT TRIN TIMt trong nhng cng vic quan trng ca khai ph lut kt hp l tm tt c

    cc tp ph bin trong c s d liu, nn trong thi gian ti chng em s pht trin

    ti rng ra theo hng: ng dng thut ton song song p dng cho bi tonkhai ph lut kt hp m, l lut kt hp trong cc tp thuc tnh m.

    Thut ton song song chia u c s d liu v tp ng vin cho cc b vi sl, v cc tp ng vin sau khi chia cho tng b x l l hon ton c lp vinhau mc ich ci thin chi ph tm lut kt hp m v thi gian m ho d liu.

    Do nhc im ca thut ton Apriori l nu d liu ln th s phn tch smt rt nhiu thi gian v vy khc phc c nhc im th chng ta cns dng thm mt s thut ton khc v d nh thut ton FP_Growth, thut ton

    song song,..Tip tc hon thin h thng Qun l bn hng ti siu th v c th ng

    dng thm vo cc lnh vc khc nh bn hng ti cc siu th, bn my tnh,..

    Khi m lng d liu thu thp v lu tr ngy cng tng, cng vi nhu cunm bt thng tin, th nhim v t ra cho Khai ph d liu ngy cng quan trng.S p dng c vo nhiu lnh vc kinh t x hi, an ninh quc phng cng l mtu th ca khai ph d liu. Vi nhng mong mun chng em hy vng s dna nhng kin thc c t ti ny sm tr thnh thc t, phc v cho cuc

    sng con ngi chng ta.

    64

  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    65/67

    TI LIU THAM KHO[1]. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I.Verkamo. Fastdiscovery of association rules. In Advances in Knowledge Discovery and Data

    Mining, pages 307328,1996.[2]. R. Agrawal and R. Srikant. Fast algorithms for mining associationrules. TheInternational Conference on Very LargeDatabases, pages 487499, 1994.

    [3]. R. Agrawal and R. Srikant. Mining sequential patterns. InP. S. Yu and A. L. P.Chen, editors, Proc. 11th Int. Conf. DataEngineering, ICDE, pages 314. IEEEPress, 610 1995.

    [4]. N. F.Ayan, A. U. Tansel, and M. E. Arkun. An efficient algorithm to updatelarge itemsets with early pruning. In KnowledgeDiscovery and Data Mining, pages

    287291, 1999.[5].TS Phc, Khai thc d liu, Nh xut bn i Hc Quc Gia TP HCM 2005.

    [6].Phm Hu Khang, K thut lp trnh C#.Net, Nh xut bn Lao ng- X Hi.

    [7].Tng bc hc lp trnh Visual C#.Net, Nh xut bn Lao ng- X Hi.

    [8]. Gio trnh tr tu nhn to - cu trc d liu - gii thut di truyn, Nh xut bnLao ng- X Hi.

    [9]. http://www.cs.uh.edu/~ceick/6340/grue-assoc.pdf, truy cp cui cng ngy20/03/2009.

    [10].http://www.vnulib.edu.vn:8000/dspace/bitstream/123456789/1811/1/sedev0206-03.pdf, truy cp cui cng ngy 22/03/2009.

    [11].http://gralib.hcmuns.edu.vn/gsdl/collect/hnkhbk/index/assoc/HASH0107.dir/doc.pdf, truy cp cui cng ngy 20-03-2009.

    [12].http://www.tapchibcvt.gov.vn/News/PrintView.aspx?ID=15671, truy cp cuicng ngy 22-03-2009.

    [13].http://www.uit.edu.vn/forum/index.php?act=Attach&type=post&id=50641,

    truy cp cui cng ngy 20-03-2009.

    65

    http://www.cs.uh.edu/~ceick/6340/grue-assoc.pdfhttp://gralib.hcmuns.edu.vn/gsdl/collect/hnkhbk/index/assoc/HASH0107.dir/doc.pdfhttp://gralib.hcmuns.edu.vn/gsdl/collect/hnkhbk/index/assoc/HASH0107.dir/doc.pdfhttp://www.tapchibcvt.gov.vn/News/PrintView.aspx?ID=15671http://www.uit.edu.vn/forum/index.php?act=Attach&type=post&id=50641http://www.cs.uh.edu/~ceick/6340/grue-assoc.pdfhttp://gralib.hcmuns.edu.vn/gsdl/collect/hnkhbk/index/assoc/HASH0107.dir/doc.pdfhttp://gralib.hcmuns.edu.vn/gsdl/collect/hnkhbk/index/assoc/HASH0107.dir/doc.pdfhttp://www.tapchibcvt.gov.vn/News/PrintView.aspx?ID=15671http://www.uit.edu.vn/forum/index.php?act=Attach&type=post&id=50641
  • 8/8/2019 Bao Cao Tot Nghiep KPDL

    66/67

    BNG I CHIU THUT NG VIT - ANH

    Ting Anh Ting Vit

    Data Mining Khai ph d liu

    Data D liu

    Knowledge Discovery in Database-KDD Pht hin tri thc trong c s d liu

    Target Mc ch, mc tiu.

    Clearsed Preprocessed Prepadated Lm sch - Tin x l - Chun b trc

    Transform Chuyn i

    Pattern Discovery Khm ph m hnh

    Knowlege Tri thc

    Clustering Phn cm

    Summerization Tm tt

    Visualiztion Trc quan ho

    Evolution and deviation analyst Phn tch s pht trin v lch

    Association rules Phn tch lut kt hp

    Classification Phn lp

    Regression Hi quy

    Clustering Gom nhm

    Summarization Tng hp

    Dependency modeling M hnh rng buc

    Change and Deviation Dectection D tm bin i v lch

    Hi qui Regression

    Cross validation nh gi cho

    Support Ph binMinimum Support ph bin ti thiu

    Confidence tin cy

    Minimum Confidence tin cy ti thiu

    Itemset Hng mc

    Procedur