50
Tiểu luận Công nghệ tri thức Tramg 1/50 A – L THUYT I. TÌM HIỂU TỔNG QUAN VỀ DATA MINING II. CÁC PHƯƠNG PHÁP MÁY HỌC ỨNG DỤNG TRONG DATA MINING 1. Tìm luật kết hợp (Mining Association Rules) 2. Phân lớp (Classification) 3. Gom nhóm (Clustering) III. THUẬT TOÁN TÌM LUẬT KT HỢP ỨNG DỤNG TRONG DATA MINING 1. Giới thiệu – các định nghĩa liên quan Các vấn đề về luật kết hợp Support Confidence 2. Tổng quan về các thuật toán tìm luật kết hợp 2.1 Thuật toán cơ sở 2.2 Các thuật toán tuần tự (Sequential) Thuật toán AIS Thuật toán SETM Thuật toán Apriori Thuật toán Apriori-TID Thuật toán Apriori-Hybrid Một số thuật toán khác: Off-line Candidate Determination (OCD) Partitioning Sampling Dynamic Itemset Counting (Brin1997a) Thái Thị Bích Thủy – Nguyễn Thị Kim Ngân – Nguyễn Thị Diễm Thúy

Bc Cnghett Nhom 4

Embed Size (px)

DESCRIPTION

Báo cáo

Citation preview

Tiu lun Cng ngh tri thc

Tramg 1/35

A LY THUYTI. TM HIU TNG QUAN V DATA MININGII. CC PHNG PHP MY HC NG DNG TRONG DATA MINING

1. Tm lut kt hp (Mining Association Rules)

2. Phn lp (Classification)

3. Gom nhm (Clustering)

III. THUT TON TM LUT KT HP NG DNG TRONG DATA MINING

1. Gii thiu cc nh ngha lin quan

Cc vn v lut kt hp

Support

Confidence

2. Tng quan v cc thut ton tm lut kt hp

2.1 Thut ton c s

2.2 Cc thut ton tun t (Sequential)

Thut ton AIS

Thut ton SETM

Thut ton Apriori

Thut ton Apriori-TID

Thut ton Apriori-Hybrid

Mt s thut ton khc:

( Off-line Candidate Determination (OCD)

( Partitioning

( Sampling

( Dynamic Itemset Counting (Brin1997a)

( CARMA (Continuous Association Rule Mining Algorithm)

2.2 Cc thut ton song song v phn tn

Cc thut ton song song d liu (Data Parallelism)

( CD

( PDM

( DMA

( CCPD

Cc thut ton song song tc v (Task Parallelism)

( DD

( IDD

( HPA

( PAR

Cc thut ton khc

( Candidate Disstribution

( SH

( HD

3. So snh cc thut ton

IV. TNG KTB BAI TP H CHUYN GIA

XY DNG H CHUYN GIA H H TR CHN ON MT S BNH THNG THNG TR EMI. Xy dng c s tri thc cho bai toanII. Cai t chng trinh DEMO

Ti liu tham khoLI CM NNhm sinh vin thc hin xin gi li cm n chn thnh n Thy Phan Huy Khnh, Thy tn tnh truyn t nhng kin thc cn thit cho chng em trong sut thi gian mn hc. Tuy y l mn hc mang tnh tru tng cao nhng qua cc bi ging c th v v d thc tin sinh ng ca Thy, chng em nm bt tt ni dung mn hc v c nhng nh hng chnh xc hn cho tng lai.

Do thi gian v trnh cn nhiu hn ch cng nh s lng ln cc thut ton cn trnh by, chc chn kha lun cn c ch sai st. Nhm chng em rt mong nhn c kin gp v ng vin ca Thy cng nh tt c cc Anh/Ch v cc bn tiu lun c hon thin hn na.

Xin chn thnh cm n!

(A LY THUYTI. TM HIU TNG QUAN V DATA MINING

1. Data Mining l g?Data Mining (khai ph d liu) l vic s dng nhng cng c phn tch d liu phc tp tm ra nhng tri thc cha c bit n, nhng m hnh thch hp, nhng mi quan h trong nhng c s d liu ln. V bn cht, khai ph d liu lin quan n vic phn tch cc d liu v s dng cc k thut tm ra cc mu hnh c tnh chnh quy (regularities) trong tp d liu. V vy, Data Mining khng nhng tp hp, qun l d liu m cn phn tch, tin on d liu.Nm 1989, Fayyad, Piatestsky-Shapiro v Smyth dng khi nim Pht hin tri thc trong c s d liu (Knowledge Discovery in Database KDD) ch ton b qu trnh pht hin cc tri thc c ch t cc tp d liu ln. Trong , khai ph d liu l mt bc c bit trong ton b qu trnh, s dng cc gii thut c bit chit xut ra cc mu (pattern) (hay cc m hnh) t d liu.

Data Mining c th thao tc trn d liu dng nh lng, c cu trc hoc a phng tin. Nhng ng dng Data Mining c th s dng cc phng php khc nhau kho st d liu nh:

- M hnh kt hp: mt s kin ny c kt hp vi mt s kin khc, v d: mua bt v mua giy.

- M hnh phn tch ng i: mt s kin ny dn n mt s kin khc, v d: a tr ra i dn n vic mua t lt.- M hnh phn lp: xc nh nhng m hnh mi.

- M hnh gom nhm: tm kim v ghi li thnh nhm nhng s kin cha khm ph trc y, nh v tr a l, mc u tin.

- M hnh d bo: khm ph nhng m hnh m con ngi c th tin on ng v nhng s kin tng lai.

Data Mining c xem l mt tin trnh ca pht hin tri thc (Knowledge Discovery) trong c s d liu.2. Hn ch ca Data Mining

Data Mining c xem l nhng cng c rt mnh nhng bn thn n cng khng th l ng dng c lp. Data Mining i hi nhng chuyn gia phn tch v chuyn gia k thut c k nng phn tch v minh ha u ra d liu. V vy, nhng hn ch ca Data Mining lin quan n d liu hoc con ngi hn l v mt cng ngh.Mc du Data Mining c th khm ph ra nhng m hnh v quan h trong c s d liu, n khng th cho ngi dng bit c gi tr v ngha ca nhng m hnh , m i hi ngi s dng phi t xc nh n. Tng t, gi tr ca nhng m hnh c khm ph ty thuc vo n c so snh vi th gii thc nh th no.

Mt hn ch khc ca Data Mining l khi n xc nh s lin quan gia hnh vi v bin, n khng cn thit phi xc nh nguyn nhn ca quan h.

3. ng dng ca Data Mining

Data Mining c ng dng cho nhiu mc ch khc nhau trong c hai lnh vc chung v ring. Nhng ngnh nh ngn hng, bo him, y t v bun bn l, Data Mining c s dng gim chi ph, nng cao vic tm kim th trng v tng cng kh nng bun bn. V d, ngnh bo him v ngn hng s dng Data Mining kim tra s gian ln v gip trong vic nh gi nhng ri ro. S dng nhng d liu ca khch hng qua nhiu nm, cng ty c th pht trin nhng m hnh tin on mt khch hng l c ng tin cy hay khng hoc mt bo co v tai nn c th l gian ln v nn c iu tra nghin cu li...

Trong lnh vc chung, nhng ng dng Data Mining khng nhng c dng nh l mt phng tin kim tra s gian ln v lng ph m cn c dng cho nhng mc ch nh o lng v ci tin vic thc hin chng trnh. Data Mining cng gip chnh ph lin bang thu hi hng triu la b gian ln trong qu h tr chm sc ngi gi, gip b t php a ra nhng m hnh ti phm v phn phi ngun lc thch hp, tr gip tin on s thay i nhn khu v c lng tt hn v nhu cu ngn sch,...

Gn y, Data Mining c xem l cng c quan trng trong vn an ninh quc gia. Mt s ngi ngh rng Data Mining nn c s dng nh l phng tin xc nh nhng hot ng khng b nh chuyn tin v thng tin, v xc nh nh, nh du nhng ngi khng b qua h s du lch, di c. Hai ng dng Data Mining u tin gy c s ch mnh m l d n nhn bit thng tin khng b (Terrorism Information Awareness-TIA) v h quan st hnh khch trc mn hnh c my tnh tr gip (Computer Assisted-Passenger Prescreening System II-CAPPS II). C 2 h thng ny ra i sau s kin 11-09-2001, ngy nc M b bn khng b tn cng, nhm m bo an ton cho cc chuyn bay trc nguy c khng b. Hin ti, d n TIA khng c tip tc v CAPPS II c thay th bi h thng Chuyn bay an ton (Security Flight).

4. Mt s vn v Data Mining

a. Cht lng d liu

Cht lng d liu l mt thch thc ln i vi Data Mining. Cht lng d liu c bit nh l chnh xc v ton vn d liu. Cht lng d liu cng c th b nh hng bi cu trc v s nht qun ca d liu ang c phn tch.

S hin din ca nhng bn ghi trng nhau, s thiu d liu chun, d liu c cp nht cng mt lc v li do con ngi c th tc ng ng k n hiu qu ca nhng k thut Data Mining, c th l s khc nhau tinh vi c th tn ti trn d liu.

ci tin cht lng d liu, i khi phi tinh ch d liu nh loi b cc bn ghi trng nhau, hnh thc ha cc gi tr biu din trong c s d liu (v d: no c thay th thnh 0 hay N...), tnh ton nhng im d liu b thiu, loi b nhng trng d liu khng cn thit,...

b. Tng tc gia cc thnh phn

l s tng tc gia cc thnh phn c s d liu v phn mm Data Mining. S tng tc m ch kh nng ca mt h thng my tnh v/hoc d liu lm vic vi nhng h thng khc, hoc d liu s dng nhng tin trnh hoc tiu chun chung.

i vi Data Mining, s tng tc gia cc thnh phn c s d liu v phn mm Data Mining l quan trng cho php vic tm kim v phn tch nhiu c s d liu cng mt lc, v m bo cho s tng thch ca Data Mining vi cc hot ng ca nhng trm lm vic khc nhau.

c. S mnh cao c (Mission Creep)

Mission Creep l mt trong nhng ri ro hng u ca Data Mining. Mission creep c bit nh l vic s dng d liu cho nhng mc ch khc hn l thu thp d liu gc, khng quan tm d liu c cung cp l ty chn hay c thu thp qua nhng phng tin khc nhau.

d. Tnh bo mt (Privacy)

Khi nim vn bo mt lin quan n mc ch thc s ca d n v tim nng ca ng dng Data Mining pht trin xa hn mc ch ban u ca n. V d, vi chuyn gia ngh ng dng Data Mining chng khng b cng c th ng dng kt hp vi nhng loi ti phm khc.

II. CC PHNG PHP MY HC

NG DNG TRONG DATA MINING1. Tm lut kt hp (Mining Association Rules)

Nhim v ca tm lut kt hp l tm cc mi quan h gia tp cc i tng (cn gi l cc phn t ) trong mt CSDL. Cc mi quan h ny c din t bi lut kt hp v mi lut c hai php o: h tr (support) v tin cy (Confidence). Tm lut kt hp rt thch hp cho cc cc ng dng nh crossing-marketing v attached mailing. Ngoi ra n cn c p dng trong thit k catalog, add-on sale, store layout v phn on khch hng da trn n mua hng. Bn cnh lnh vc kinh doanh, tm lut kt hp cn c p dng trong cc lnh vc khc nh chn on y hc,...2. Phn lp (Classification)

Phn lp trong Data Mining c cng nhn l mt phng php my hc hiu qu hin ang p dng trong nhiu mt ca khoa hc thng k, ghi nhn mu, l thut quyt nh, my hc, mng n-ron,

Ba bc x l chnh ca phn lp:

Bc 1: xy dng mt m hnh s dng tp d liu bit, c gi l d liu tp hun (Training data) hay cc mu (Sample).

Bc 2: nh gi chnh xc c on ca m hnh s dng d liu th (test data).

Bc 3: s dng m hnh d on d liu cha bit (nu chnh xc c chp nhn).

Chun b d liu phn lp:

Lm sch d liu: xa nhiu v cc gi tr tht lc.

Kim tra khng thch hp: loi b cc thuc tnh d tha hoc khng thch hp.

Chuyn i d liu: d liu c tng qut ha ln mc khi nim cao hn hoc c chun ha.

3. Gom nhm (Clustering)

Gom nhm l nhm tp cc i tng vo nhng nhm tng ng nhau,ch nhm l lp c tnh tng t cao ngoi lp v c tnh tng t thp trong lp. V d khm ph cc nhm khch hng khc bit, phn loi gen theo chc nng tng t nhau, nhn din nhm ngi mua bo him xe -t c t l yu cu trung bnh cao,..

Gom nhm khc phn lp (Classification) ch n khng xc nh trc cc lp v cng khng nh nhn lp cho tp mu tp hun.

Phn lp l phng php hc theo mu, hc c gio vin cn gom nhm l hc theo s quan st, khng c gio vin.

III. THUT TON TM LUT KT HP NG DNG TRONG DATA MINING

III.1. Gii thiu cc nh ngha lin quanCho l tp ca cc phn t.

Mt tp X = {i1, i2,,ik} ( c gi l mt tp phn t (itemset), hay tp k nu n cha k phn t.

Mt giao tc T trn l mt b T c dng T = (tid, I) trong tid l ch s nh danh ca giao tc v I l mt itemset.

Mt c s d liu (CSDL) giao tc D trn l tp cc giao tc trn .

Mt lut kt hp l mt th hin c dng X ( Y, trong X,Y ( I l itemset v X ( Y = (. X c gi l s hng ng trc trong khi Y c gi l v th hai. Lut y ngha l X xc nh Y.

Ph ca mt tp X trong D gm tp cc nh danh ca giao tc trong D h tr X:cover(X,D) := {tid | (tid,I) ( D, X ( I}. h tr (support) ca mt tp X trong D l s giao tc c trong ph ca X trong D:support(X,D) := | cover(X,D) |

Hay ni cch khc, h tr ca X l t s cc giao tc T h tr tp phn t X trong c s d liu D:

support(X) = |{T(D | X ( T}| / |D|.

Trong [Agrawal1993] [Cheung1996c], support(s) ca mt lut kt hp l t s (tnh theo phn trm) ca cc bn ghi c cha X ( Y trn tng s bn ghi ca CSDL.

Nh vy nu ta ni, h tr ca mt lut l 5% th c ngha l c 5% trn tng s bn ghi c cha X ( Y. h tr ca lut X ( Y c nh ngha nh sau: support(X(Y) = support(X(Y).

Tnh ph bin ca tp X trong D l kh nng xut hin ca X trong mt giao tc T ( D:

frequency(X,D) := P(X) = support(X,D) / |D|Mt tp phn t c gi l ph bin nu h tr ca n khng nh hn tr tuyt i ngng h tr ti thiu (minimal support threshold) abs vi 0 ( abs ( |D|.

Khi lm vic vi cc tp ph bin, thay v s dng support ca chng ta dng mt khi nim lin quan l ngng ph bin ti thiu (minimal frequency threshold) rel vi 0 ( rel ( 1. Hin nhin abs = [rel . |D| ].

tin cy (confidence) hay chnh xc (accuracy) ca lut kt hp X ( Y trong D c nh ngha nh sau:

confidence(X( Y ,D):=P(Y|X) = support(X ( Y,D) / support(X,D)

Mt lut c gi l tin cy (confident) nu P(Y|X) vt qu ngng tin cy ti thiu vi 0 ( ( 1.

Trong [Agrawal1993] [Cheung1996c], confidence(() l t s (tnh theo phn trm) ca s bn ghi c cha X ( Y trn tng s bn ghi ca CSDL c cha X.

Ngha l nu ni tin cy l 85% th 85% s bn ghi cha X cng cha Y.

Tin :

Cho CSDL giao tc D trn , X,Y ( l hai itemset, khi :

X ( Y ( support(Y) support(X)

Chng minh:

iu ny c c ngay t cover(Y) ( cover(X). (PCM) (III.2. Tng quan v cc thut ton tm lut kt hp

1. Thut ton c s

Tm cc lut kt hp t mt c s d liu bao gm qu trnh tm tt c cc lut ph hp vi ngng support v confidence do ngi dng n nh. Vn ny c th c phn thnh 2 vn nh hn [Agrawal1994] nh c trnh by trong thut ton 1.Thut ton 1. Thut ton c s.

Input

I, D, s,

Output

Cc lut kt hp tho s v

Thut ton

( Tm mi itemset xut hin c tn s ln hn hoc bng support s do ngi dng n nh.

( Pht sinh cc lut tho mn tin cy confidence .

Bc th nht ca thut ton s tm cc mc d liu thng xuyn xut hin trong c s d liu tho ngng minsupp (cc tp ph bin). Cc mc d liu khc c gi l small itemset khng ph bin.

Mt nhn xt ng ch l: nu mt tp cc mc d liu X khng tho support s th cc tp ln hn ca n (superset) cng s khng tho s v ngc li, nu X tho s th superset ca X cng tho s.

Bc th hai ca thut ton 1 s tm cc lut kt hp s dng tp ph bin tm c bc 1.

Xem xt v d sau.

V d 1:

Gi s c mt c s d liu nh vi bn mc d liu I={Bnh m, B, Trng, Sa} v c bn giao dch nh trong bng 1. Bng 2 l cc tp d liu c th c ca I. Gi s rng minsupp v minconf ln lt l 40% v 60%. Hy xem bng 3: u tin ta phi tm cc lut tho minsupp, sau phi xem xt cc lut c tin cy minconf ti thiu l 60%. Cc mc d liu tho mn hai iu kin trn l {Bnh m, B} v {B, Trng}. Minsupp cho mi mc d liu n ti thiu l 40% (xem bng 2), v th cc mc d liu ny l ph bin. tin cy ca cc lut c trnh by trong bng 3, d dng thy rng lut (Bnh m ( B) l hin nhin tho. Tuy nhin lut th hai (B ( Trng) th khng tho v c minconf nh hn 60%.

Bng 1: c s d liu cho v d 1

Transaction IDMc d liu

T1Bnh m, B, Trng

T2B, Trng, Sa

T3B

T4Bnh m, B

Bng 2: minsupp cho cc tp phn t ca bng 1

Ghi ch:(Large: ph bin(Small: khng ph binMc d liuSupport, s (%)Large/Small

Bnh m50Large

B100Large

Trng50Large

Sa25Small

Bnh m, B50Large

Bnh m, Trng25Small

Bnh m, Sa0Small

B, Trng50Large

B, Sa25Small

Sa, Trng25Small

Bnh m, B, Trng25Small

Bnh m, B, Sa0Small

Bnh m, Trng, Sa0Small

B, Trng, Sa25Small

Bnh m, B, Trng, Sa0Small

Bng 3: cc lut tho minconf ( 60%Lut tin cy (%)Chn lut

Bnh m ( B100C

B ( Bnh m50Khng

B ( Trng50Khng

Trng ( B100C

Vic tnh ton cc mc d liu ph bin rt tn km [Agrawal 1994]. Tuy nhin, c mt thut ton trc tip tm cc lut kt hp nh thut ton 2 sau y (thut ton ny c trnh by trong [Agrawal1994]).

Thut ton 2 Tm cc lut kt hp da vo tp ph bin cho.

Input

I, D, s, , L

Output

Cc lut kt hp tho s v

Thut ton

( Tm tt c cc tp con x khng rng ca mi tp ph bin, l ( L

( i vi mi mt tp x, xc nh lut dng x( (l-x) nu t l tn s xut hin ca l vi tn s cut hin ca x ln hn hoc bng ngng tin cy.V d, gi s ta cn xc nh lut (Bnh m ( B) c c chn trong v d 1. y, l={Bnh m, B} v x={Bnh m}, nh vy (l-x)={B}. Khi support(Bnh m, B) vi support(Bnh m) l 100%, ln hn ngng tin cy cho, vy lut ny c chn. lm r thm ta xt lut th 3 (B ( Trng), trong x = {B} v (l-x) = {Trng}. T l support(B, Trng) v support(B) l 50%, nh hn ngng tin cy ti thiu l 60%. Nh vy ta ni khng c s kt lun rng lut (B ( Trng) t tin cy 60%.

Qu trnh tm cc lut kt hp trong cc c s d liu cc ln l rt tn km v t hiu sut thp. V vy hu ht cc ci tin sau ny u theo hng tm nhng thut ton hiu qu hn cho bc th nht [Algrawal1994] [Cheung1996c] [Klemettien1994]. Phn tip theo s trnh by cc thut ton ny.2. Cc thut ton tun t (Sequential Algorithm)Phn ny s trnh by mt cch tng qut v cc thut ton c tm lut kt hp. Hu ht cc thut ton c dng nhn dng cc tp ph bin c phn thnh 2 lp: tun t v song song. Trong hu ht cc trng hp, cc thut ton ny gi thit rng cc tp phn t c nhn dng v sp xp theo th t th lexico (lexicographic th da trn tn ca mc d liu). Kiu sp xp ny cung cp mt cch qun l logic m theo cc tp phn t c th c pht sinh v thng k. y l hng tip cn tiu chun vi cc thut ton tun t. On the other hand, cc thut ton song song tp trung vo phng php sao cho x l song song tc v khi tm kim cc tp ph bin. Sau y chng ta s tho lun v cc thut ton loi ny.2.1 Thut ton AIS

AIS l thut ton c cng b u tin nhn dng tt c cc tp ph bin trong mt c s d liu giao tc [Agrawal1993]. N tp trung vo s nng cao cc CSDL x l cc cu hi h tr ra quyt nh. Thut ton ny l ch nhm khm ph cc lut kt hp cht lng (qualitative rule).

Thut ton AIS duyt c s d liu u vo nhiu ln v mi ln duyt, AIS qut qua mi giao tc. Trong ln duyt u tin, AIS m support ca cc mc d liu ring bit v xc nh chng thuc tp ph bin hay khng. Tp ph bin ca mi ln duyt c m rng pht sinh t hp cc tp phn t. Sau khi qut mt giao tc, cc tp phn t chung ca ln qut trc v cc phn t ca giao tc hin ti c xc nh. Tp phn t chung ny c m rng vi nhng phn t khc trong giao tc nhm to ra t hp mi. Mt tp mc d liu ph bin l c m rng ch vi nhng mc d liu l ph bin v xut hin tr hn trong th t sp xp lexicographic ca bt k mc d liu no c trong l. t c hiu qu trong tc v ny, AIS s dng cng c c lng v k thut ct ta (pruning). Cc k thut c lng v ct ta xc nh cc t hp bng cch b qua cc tp phn t khng cn thit t tp t hp. Sau , support ca mi t hp s c tnh ton. Cc t hp c support ln hn hoc bng minsupp c chn l tp ph bin. Tp mc d liu ny c m rng to cc t hp ln duyt k tip. Qu trnh ny s dng khi khng c tp mc d liu no c tm thy.

Thut ton ny bo m rng nu tp phn t no khng tn ti trong ton th CSDL th n khng bao gi tr thnh mt t hp cho tp ph bin trong ln duyt tip theo.

Vic pht sinh s lng ln cc tp t hp thng l nguyn nhn dn n trn b nh m. Bi vy mt s phi hp vic qun l b nh m l cn thit. Thut ton AIS xut hng gii quyt rng cc tp ph bin khng cn thit phi lu trong b nh trong sut qu trnh duyt CSDL m c th c lu trn a. Thut ton qun l b nh m cho cc tp t hp c cung cp trong [Agrawal1993].Vn chnh ca AIS l n pht sinh qu nhiu t hp m sau ny s khng tho ngng cho php [Agrawal1994]. Nu mt c s d liu c m mc d liu v mi mc d liu u xut hin trong mi giao tc th s c 2m trng hp cn xt. Nh vy,trong trng hp xu nht, phc tp ca thut ton ny l hm m theo m.2.2 Thut ton SETMThut ton SETM c xut trong [Houtsma1995] v thch hp cho SQL nhm tnh ton cc tp ph bin [Srikant1996]. Trong thut ton ny, mi thnh phn ca tp ph bin, k, u c khun dng trong TID l nh danh duy nht ca giao tc. Tng t, mi thnh phn ca tp cc tp t hp, k, cng c dng .Tng t nh AIS, SETM tin hnh duyt nhiu ln trn c s d liu. Trong ln duyt u tin, n n support ca cc mc d liu ring l v xc nh xem chng c tho ngng t hay khng. Sau SETM s pht sinh cc t hp bng cch m rng tp ph bin ca ln duyt trc. Thm vo o, SETM s ghi nh cc TID ca cc giao tc ang pht sinh cng cc tp t hp. Trong qu trnh pht sinh t hp, SETM lu mt bn sao ca tp ph bin cng vi TID qu trnh pht sinh giao tc theo dng tun t (sequential manner). Sau ny, cc t hp s c lu di dng cc tp phn t v cc tp khng tho ngng s b xo bng hm kt hp (aggregation function). Nu c s d liu c sp xp theo TID, tp ph bin cha trong mt giao tc bc k tip s c thu nhn bng cch sp xp k theo TID. Theo cch ny, c nhiu ln duyt CSDL s c to ra v khi khng cn tp ph bin no c tm thy, thut ton s dng.im bt li chnh ca SETM l chi ph phi tr cho s lng cc tp t hp [Agrawal1994]. V mi t hp c mt TID tng ng nn n yu cu thm khng gian nhm c th lu tr c mt s lng ln cc TID. Hn na, khi support ca mt t hp c thng k ti im cui ca ln duyt, k li khng c sp theo th t, bi vy cn phi c mt ln sp xp na trn cc tp phn t. Sau , cc t hp c tch (pruned) bng cch loi b cc tp khng tho ngng support. C mt cch sp xp khc trn TID l cn thit cho tp kt qu k . Sau ny, k c th c dng pht sinh cc t hp cho ln duyt k tip. Trong thut ton SETM [Agrawal1994], gi thit rng k c th cha c trong b nh chnh, v vy khng cn phi qun l b m. Hn na, theo [Sarawagi1998], SETM l khng c hiu qu v mi khi thc hin trn CSDL quan h s khng thy phn bo co kt qu.2.3 Thut ton AprioriThut ton Apriori c pht trin bi [Agrawal1994] l thnh tu ln nht trong lch s khi ph lut kt hp [Cheung1996c]. Chi tit ca thut ton s c trnh by trong [IV.1]. im khc bit c bn gia Apriori vi AIS v SETM l cch thc to v chn t hp thng k tho ngng hay khng. Nh trnh by, c trong AIS v SETM, cc tp phn t chung gia tp ph bin ca ln duyt trc v cc phn t ca giao tc c xc nh. Cc tp chung ny c m rng vi cc phn t ring bit trong giao tc to cc t hp. Tuy nhin, cc phn t ring bit ny c th l khng ph bin. Ta bit rng s kt hp ca tp ph bin v tp khng ph bin c th li to ra tp khng ph bin nn k thut ny pht sinh qu nhiu t hp c th khng tho ngng. Thut ton Apriori to cc t hp bng cch kt ni cc tp ph bin ca ln duyt trc v xo cc tp con khng tho trong ln duyt trc m khng xt n cc giao tc trong CSDL. Bng cch ch xt n tp ph bin ca ln duyt trc, s lng t hp gim i dng k.2.4 Thut ton Apriori-TIDApriori qut d liu u vo cho mi ln duyt nhm thng k support nhng vic qut d liu u vo ny khng cn thit phi thc hin tt c cc ln duyt. Da vo iu ny, [Agrawal1994] xut thut ton khc gi l Apriori-TID. Tng t nh Apriori, Apriori s dng hm to t hp ca Apriori xc nh tp t hp phn t trc mi ln duyt. S khc bit chnh so vi Apriori l thut ton ny khng dng CSDL thng k support sau ln duyt u tin. ng hn l n dng mt bng m ca tp t hp cc phn t, k hiu l k, dng trong ln duyt trc. Nh trong SETM, mi phn t trong k c dng vi Xk l kh nng hin din ca tp k phn t trong giao tc c nh danh TID. Ti ln duyt u tin, 1 tng ng vi CSDL, tuy nhin mi phn t c thay bng tp phn t. Trong cc ln duyt khc, thnh phn ca k tng ng vi giao tc T l vi c l mt t hp thuc k cha trong T. Bi vy kch thc ca k c th nh hn s lng giao tc trong CSDL. Hn na, mi dng vo trong k c th nh hn giao tc tng ng khi gi tr k ln. C iu ny bi v rt t t hp c th c cha trong giao tc. Chi tit thut ton c trnh by trong [IV.3].im tin li khi s dng hm m ho trn l trong cc ln duyt tip sau, kch thc hm s tr nn nh hn so vi CSDL, theo s tit kim c nhiu ln c d liu. Apriori-TID cng thc hin tt hn hai thut ton AIS v SETM.Trong Apriori-TID, tp t hp cc phn t trong Ck c lu tr trong mt mng sp xp theo cc TID. Mi Ck li c lu theo cu trc tun t. Ti ln duyt th k, Apriori-TID cn c khng gian b nh dnh cho Lk-1 v Ck trong qu trnh to t hp. Khng gian b nh l cn thit cho Ck-1, Ck, k v k -1 ti pha thng k. Ti thi im pht sinh t hp, phn na b m c lp y bng cc t hp v iu ny cho php cc phn ca Ck v Ck-1 c lu trong b nh sut trong pha tnh ton. Nu b nh khng cn lu tr Lk, thut ton khuyn co nn sp xp Lk thit b lu tr ngoi.Apriori-TID s dng k thay cho CSDL u vo sau ln duyt u tin nn nn n t hiu qu cao cc ln duyt sau khi tp k tr nn nh hn. Tuy nhin Apriori-TID c cng rc ri vi SETM khi k c khuynh hng ln dn, nhng Apriori-TID th pht sinh tp t hp t hn ng k so vi SETM. Khi k ln dn, mt vn na pht sinh l vic qun l b m (buffer management). y l cn c chng t Apriori TID thc hin tt hn Apriori khi s tp k l nh c th a vo b nh v s phn b cc tp ph bin l ln [Srikant1996b]. S phn b ny s tr nn nh hn ngay sau khi vn n tt nh v c tip tc trong mt thi gian di. iu ny cho thy hiu sut ca Apriori tt hn Apriori-TID i vi cc tp d liu ln [Agrawal1994] nhng mt khc Apriori-TID li tt hn Apriori khi cc tp k l tng i nh (va trong b nh). V vy xut hin k thut Apriori-Lai c cp bi [Agrawal994].2.5 Thut ton Apriori-HybridT tng ca thut ton ny l khng nht thit phi s dng cng mt thut ton trong tt c cc ln duyt trn d liu. Nh c cp trong [Agrawal1994], Apriori t hiu sut tt hn trong nhng ln duyt u tin, v Apriori-TID thc hin tt hn Apriori nhng ln duyt sau. Da trn nhng th nghim quan st c, k thut Apriori-Hybrid c pht trin nhm s dng Apriori nhng ln duyt khi ng v chuyn sang Apriori-TID mt khi cho rng tp k cui ln duyt s lp va b nh. Bi vy, s tnh ton k khi no t c trng thi trn l cn thit. Mt khc, vic chuyn t Apriori sang Apriori-TID cng phi mt mt lng ch ph nht nh. Hiu nng ca k thut ny c th nh gi c bng cc th nghim trn cc tp d liu ln. Thut ton ny l tt hn Apriori ngoi tr trng hp vic chuyn trng thi (t thut ton ny sang thut ton kia) xy ra ti thi im cui cng ca mi ln duyt [Srikant1996b].2.6 Mt s thut ton khc

( Off-line Candidate Determination (OCD)K thut ny c xut trong [Mannila1994] da trn tng l cc mu khng ph bin thng rt tt tm cc tp ph bin. K thut OCD s dng kt qu phn tch t hp cc thng tin nhn c t ln duyt trc nhm loi ra cc tp khng cn thit. Nu mt tp Y ( l l khng ph bin, c t nht (1-s) giao tc phi c duyt vi s l ngng support. Bi vy khi s nh, gn nh ton b quan h a vo c c. R rng l i vi cc CSDL rt ln, iu ny rt quan trng c th tin hnh duyt CSDL t hn. OCD l mt cch tip cn khc t AIS xc nh cc t hp. OCD s dng tt c cc thng tin sn c t ln duyt trc chia tch cc tp t hp gia cc ln duyt bng cch gi li ln duyt n gin nht c th. Kt qu ca OCD l tp Lk, l tp cc phn t ph bin c ln k. Tp t hp Ck+1, cha cc tp c ln (k+1), c th c mt trong Lk+1, nhn c t tp Lk. Ch rng nu X ( Lk+e vi e0 th X bao gm tp t Lk. iu ny c ngha rng nu e=1, k=2 v X ( L3 th X gm c hay 3 tp t L2. Tng t, mi mc ca L4 gm 4 tp ca L3 v c th tip tc. V d, vi L2={To, Chui}, {Chui, Ci bp}, {To, Ci bp}, {To, Trng}, {Chui, Trng}, {To, Kem}, {Ci bp, Sir}} ta kt lun rng {To, Chui, Ci bp} v {To, Chui, Trng} ch c th l cc thnh phn thuc L3. Bi v chng l cc tp c kch thc 3 ca tt c cc tp con c kch thc 2 trong L2. L4 l rng v bt k thnh phn no ca L4 u phi c 4 phn t thuc L3, nhng trong L3 ch c 2 phn t.T Ck+1 c tnh theo cng thc sau:

Ck+1 = {Y ( I vi |Y|=k+1 v Y bao gm (k+1) phn t ca Lk}(1)

tm Ck+1 ta cn phi xt n mi kha cnh, mi tp con c kch thc k+1 u phi c xt n. Th tc ny xt mt tp rt ln cc t hp khng cn thit, v th rt lng ph. gii quyt, OCD ngh hai hng i khc thay th. Mt trong s l tnh tp Ck+1 bng cch to cc hp ca Lk c (k-1) phn t theo cng thc sau:Ck+1 = {Y(Y vi Y, Y ( Lk v |Y(Y|=k+1}

(2)

Nh vy Ck+1 ( Ck+1 v Ck+1 c th c tnh bng cch kim tra mi mt tp trong Ck+1 m cc iu kin nh ngha Ck+1 tho mn.Hng i th hai l xy dng hp ca cc tp t Lk v L1 nh trong (3)

Ck+1 = {Y(Y vi Y ( Lk, Y ( L1 v Y ( Y}

(3)Sau tnh Ck+1 bng cch kim tra c nhng iu kin nm trong cng thc (1).

y, phc tp tnh ton Ck+1 khng ph thuc vo kch thc ca CSDL, hay ng hn l khng ph thuc vo kch thc ca Lk. Ngoi ra c th tnh c Ck+1, Ck+2,,Ck+e vi e>1 trc tip t Lk..

phc tp thi gian tnh Ck+1 t Ck+1 l O(k|Lk|3). Ni cch khc, thi gian tnh Ck+1 t Ck+1 l tuyn tnh theo kch thc ca c s d liu (n) v hm m theo kch thc ca tp ph bin ln nht. Bi vy thut ton c th rt chm vi gi tr n rt ln. Tp ph bin c th c tnh xp x bng cch phn tch cc mu nh hn ca mt CSDL ln [Lee1998; Mannila1994]. L thuyt phn tch ny c [Mannila1994] trnh by v chng t rng cc mu nh ny rt tt tm cc tp ph bin. Theo [Mannila1994], thm ch khi n nh gi tr thp cho support, mt c s d liu mu gm 3000 dng vn c th tm xp x rt tt cc tp th ngng.Hiu sut ca thut ton ny c nh gi trong [Mannila1994] bng cch s dng hai tp d liu. Tp d liu th nht gm d liu v 4734 sinh vin, tp th hai l c s d liu qun l li h thng ca cng ty vi 30 000 bn ghi. Thc nghim ch ra rng thi gian thc hin OCD chun hn 10%-20% so vi AIS. im tin li m OCD mang li l kh nng n nh ngng support thp [Mannila1994]. Cc t hp c pht sinh trong AIS l cao hn ng k so vi OCD. AIS c th to ra cc t hp trng lp trong khi duyt, cn OCD pht sinh bt k t hp no cng ch mt ln v kim tra nu cc tp con ca n l ph bin hay khng trc khi nh gi trong c s d liu.( Partitioning

K thut ny [Savasere1995] quy s lng c s d liu cn phi duyt ch cn 2. N chia CSDL thnh nhng phn nh hn sao cho mi phn chia c th lu va trong b nh chnh.

Gi s c mt CSDL c chia thnh cc phn nh nh sau: D1, D2,, Dp. Trong ln duyt u tin, thut ton tm cc tp ph bin cc b Li (local large itemset) trong mi phn chia Di (1 i p) ngha l {X | X.count s x |Di|}.Tp ph bin cc b, Li, c th c xc nh bng thut ton level-wise nh Apriori. Mt khi mi phn chia u c lu tr va trong b nh th khng cn thit cn thm c/ghi a cho n sau khi np phn chia vo b nh chnh. ln qut th hai, thut ton s dng mt c tnh l mt tp ph bin trn ton b CSDL th cng phi ph bin cc b trn t nht mt phn vng ca CSDL. Khi hp ca cc tp phn t ph bin cc b tm thy trn mi phn chia c dng nh l cc t hp v c thng k trn ton CSDL nhm tm tt c cc tp ph bin.Nu CSDL c chia thnh hai phn, phn th nht cha hai giao dich u tin v phn th hai l hai giao dch cn li. Khi minsupp l 40% v ch c hai giao tc trong mi phn chia th mt tp phn t xut hin ch mt ln s tha ngng. Khi cc tp ph bin cc b trn hai phn vng va ng bng tt c cc tp con ca cc giao tc. Hp ca chng l tp t hp cc phn t cho ln duyt th hai.Thut ton Partition lm ng nht d liu phn tn. C ngha rng, thm ch nu s xut hin ca mt tp phn t b phn b trn mi phn vng th hu ht cc tp phn t c thng k ln qut th hai l ph bin. Tuy nhin, i vi d liu phn tn lch (skewed data distribution), hu ht cc tp phn t trong ln qut th hai c th thnh khng ph bin, do s hao tn rt nhiu thi gian ca CPU thng k cc tp trn. AS-CPA (Anti-Skew Counting Partition Algorithm) [Lin1998] l dng thut ton anti-skew, xut phng php ci tin Partition khi d liu l phn tn nh sau: trong ln duyt u tin, s lng cc tp phn t c tm thy trong cc phn chia trc s c tch lu li v c tng ln phn chia tip theo. S tch lu ny c dng lc bt nhng tp phn t khng ph bin. Nh s dng k thut lc b nhng tp phn t khng ph bin trc m s lng cc tp khng tho c m s gim i trong nhng ln duyt tip theo.( SamplingSampling[Toivonen1996] gim thiu s lng CSDL cn c duyt trong trng hp tt nht cn mt v xu nht l hai CSDL. u tin, mt phn c s d liu c th lu tr va trong b nh chnh c trch ra. S dng cc thut ton level-wise nh Apriori tm tp ph bin PL ca phn mu ny. Tp PL ny c xem nh l mt tp ph bin v c dng pht sinh cc t hp m chng s c kim tra li trn ton CSDL.Cc t hp c pht sinh bng hm ph nh bin BD (Negative Border Function) trn PL. Nh vy cc t hp l BD (PL) ( PL. Ph nh bin ca tp cc phn t PL l tp ti thiu ca tp cc phn t khng cha trong PL nhng mi tp con ca chng li thuc PL. Hm ph nh bin l s tng qut ho ca hm apriori_gen trong Apriori. Khi mi tp phn t trong PL c cng kch thc, BD (PL) =Apriori_gen(PL). S khc bit y l ph nh bin c th c p dng cho cc tp phn t c kch thc khc nhau, trong khi hm apriori_gen () ch p dng khi kch thc tp phn t l mt.Sau khi cc t hp c pht sinh, ton b CSDL c qut mt ln xc nh s lng cc t hp. Nu tt c cc tp ph bin u nm trong PL, ngha l khng c tp phn t no trong BD (PL) tr thnh ph bin, th mi tp ph bin c tm thy v thut ton dng. iu ny c m bo bi v BD (PL) ( PL thc t cha mi t hp cc tp phn t ca Apriori nu PL bao hm tt c tp cc phn t L, ngha l L ( PL. ( Dynamic Itemset Counting (Brin1997a)DIC s pht sinh v thng k cc tp phn t trc, nh vy s ln duyt CSDL s gim xung. DIC xem CSDL l cc khong phn bit (interval) ca cc giao tc, v cc interval ny c duyt ln lt. Khi duyt interval u tin, cc tp c 1 phn t c pht sinh v thng k. Ti im cui ca ln duyt ny, cc tp c 2 phn t c kh nng thnh tp ph bin c pht sinh. Khi duyt interval th hai, tt c tp gm mt v hai phn t c pht sinh c thng k. Ti im cui ca interval th hai, cc tp c ba phn t c tim nng li c pht sinh, v chng c thng k trong sut qu trnh duyt interval th ba cng vi cc tp gm mt v hai phn t. Tng qut, ti thi im kt thc ca interval th k, cc tp (k-1) phn t c tim nng thnh ph bin c pht sinh v c thng k cng vi cc tp phn t trc tai cc interval sau ny. Khi t n gii hn ca CSDL, DIC quay li u CSDL v thng k cc tp phn t cha c kim y . S ln duyt CSDL thc t ph thuc vo kch thc ca cc interval. Nu kch thc ny l nh, mi tp phn t s c pht sinh trong ln duyt u tin v c thng k y trong ln duyt th hai. DIC cng s ng nht d liu nh thut ton Partition.( CARMA (Continuous Association Rule Mining Algorithm)Vi thut ton CARMA [Hidb1999], vic tnh ton cc tp ph bin c thc hin trc tuyn (online). Khi , CARMA hin th cc lut kt hp v cho php ngi dng thay i cc tham s (parameter), minsupp v minconf ti bt k giao tc no trong sut ln duyt CSDL u tin. Thut ton ny cn nhiu nht hai CSDL. Tng t nh DIC, CARMA pht sinh cc tp phn t trong ln duyt u tin v kt thc thng k mi tp phn t ln duyt th hai. Nhng khc vi DIC, CARMA pht sinh cc tp phn t trc tuyn t cc giao tc. Sau khi c mt giao tc, u tin thut ton tng s lng tp cc phn t l cc tp con ca giao tc. Nu mi tp con trc tip ca cc tp phn t hin ti l ph bin v phn CSDL ang c c, CARMA pht sinh cc tp phn t mi t giao tc. c nhng d bo chnh xc mt tp phn t c kh nng l ph bin, thut ton tnh gii hn trn s lng ca tp phn t. Gii hn trn ca tng cc tp phn t l tng s lng thng k hin ti vi c lng s lng xut hin ca n trc khi tp phn t c pht sinh. c lng s lng xut hin (gi l lng mt mt ti a) tnh c khi tp phn t c pht sinh ln u tin.3. Cc thut ton song song v phn tn

Cc thut ton loi ny u da trn thut ton tun t Apriori. S khc nhau gia hai m hnh thut ton ny l tp t hp c phn b x l cho (across the processors) hay khng. Trong m hnh Data Parallelism, mi nt (note) m cng mt s lng tp t hp; trong m hnh Task Parallelism, mt tp t hp c phn chia v phn b x l cho, v mi nt s m mt tp khc nhau ca cc t hp. V l thuyt, CSDL c th c phn chia theo m hnh hay khng nhng t c hiu qu hn trong thao tc I/O, trong thc tin ngi ta thng gi thit rng CSDL b phn chia v phn b x l cho.Trong m hnh Data Parallelism, cc t hp b trng lp trn mi b x l v CSDL c phn b x l cho. Mi qu trnh x l chu trch nhim tnh s h tr cc b (local support count) ca mi t hp, y l cc support count ca phn chia tng ng. Tip mi qu trnh x l tnh s h tr ton cc (global support count) ca cc t hp (l tng cc support count ca cc t hp trn ton CSDL) bng cch trao i cc local support count. Ri sau , cc tp ph bin c tnh ton bi mi b x l mt cch c lp. M hnh Data Parallelism s dng d liu trong bng 1 c cung cp trong [hnh 2]: bn giao tc c phn phi qua ba b x l vi b th ba c hai giao tc T3 v T4, b 1 c T1 v b 2 c T2. Ba t hp trong ln duyt th hai b lp li trn mi b x l. Cc local support count c c sau khi duyt cc CSDL cc b.

Trong m hnh Task Parallelism, tp t hp c phn chia v phn b x l cho nh trong CSDL. Mi b x l chu trch nhim gi cc global support count ca ch mt tp con ca cc t hp. Hng tip cn ny yu cu phi c hai vng giao tip ti mi ln lp: vng th nht, mi b x l gi phn chia d liu ca n cho cc b x l khc; vng th hai, mi b x l pht cc tp ph bin c tm thy n mi b x l khc tnh ton cc t hp cho ln lp k tip. Vi d liu trong bng 1, m hnh ny c din t nh sau: bn giao tc vn c phn chia nh m hnh Data Paralleism. Ba tp te hp c phn b x l cho vi mi b x l s c mt t hp. Sau khi qut CSDL cc b v cc phn khu CSDL xut pht t cc b x l khc, global count ca mi t hp c tm thy.3.1 Cc thut ton song song d liu (Data Parallelism)Cc thut ton xt n l CD [Agrawal1996], PDM [Park1995], DMA [Cheung1996] v CCPD [Zaki1996].( CD [Agrawal1996]Trong CD, CSDL D c phn chia thnh {D1, D2,...,Dp} v phn b qua n b x l. Thut ton ny c ba bc c bn:

( Bc 1: cc local support count ca cc t hp Ck trong phn vng CSDL cc b Di s c tm thy.( Bc 2: mi b x l trao i cc local support count ca mi t hp ly c global support count ca tt c cc t hp.( Bc 3: tp ph bin Lk c nhn dng v cc t hp kch thc (k+1) phn t c to bng cch p dng th tc Apriori_gen() vi Lk trn mi b x l mt cch c lp. CD s lp li bc 1 n 3 cho n khi khng tm thy thm t hp no na. ( PDM [Park1995a]PDM (Parallel Data Mining) l mt sa i ca thut ton CD s dng k thut bm trc tip (direct hashing) c xut ti [Park1995a]. K thut ny c dng tch bt mt s t hp trong ln duyt tip theo. N c bit hu dng cho ln duyt th hai, nh Apriori khng c bt k s rt ta no trong qu trnh to C2 t L1.

Trong ln duyt u tin, thng k tt c cc tp c mt phn t, PDM duy tr mt bng bm nhm lu tr s thng k ca cc tp c 2 phn t. Ti ln duyt th k, PDM cn i cc local count trong bng bm c s phn t (k+1) vi cc local count ca tp c k phn t.( DMA [Cheung1996]DMA (Distributed Mining Algorithm) [Cheung1996] cng da trn m hnh Data Parallelism vi s b sung l k thut tch t hp v k thut gim bt thng ip giao tip (communication message reduction technique). N s dng cc local count ca cc tp ph bin trn mi b x l quyt nh tnh heavy (va ph bin trn mt phn chia va ph bin trn ton b CSDL) ca mt tp phn t ri sau to cc t hp t cc tp phn t heavy trn.V d, gi s A v B l hai mc d liu heavy trn b x l 1 v 2 tch bit nhau. Ngha l, A ph bin cc b v ton cc ch trn b x l 1, B ph bin trn b x l 2. DMA s khng to AB nh mt t hp 2 phn t trong khi thut ton Apriori s to AB nh cc local count trn mi b x l. giao tip, thay v phn pht cc local count ca mi t hp nh trong thut ton CD, DMA ch gi cc local count n mt v tr, v vy s lm gim lng giao tip t O(p2) xung cn O(p). ( CCPD (Common Candidate Partitioned Database)

CCPD [Zaki1996] thc thi CD trn shared-memory SGI Power Challenge vi mt vi s ci tin. CCPD xut cc k thut nhm tng hiu nng pht sinh v tng hp cc t hp trong mi trng b nh dng chung. N nhm cc tp t hp vo cc lp tng ng nhau da trn cc tin t chung (thng thng l mc u tin) v pht sinh cc t hp t mi lp ny.Vic lp thnh nhm s khng lm gim bt s lng nhng s gim bt thi gian pht sinh cc t hp. Thut ton ny cng gii thiu mt phng php kim tra tp con ngn mch (short-circuited subset checking) nhm tng hiu qu tnh ton cc t hp cho mi giao tc.3.2 Cc thut ton song song tc v (Task Parallelism)Cc thut ton loi ny khc Data Parallelism phng thc cc t hp v CSDL c phn chia nh th no. i din cho lp thut ton ny l DD [Agrawal1996], IDD [Han1997], HPA [Shintani1996], PAR [Zaki1997].( DD (Data Distribution)

Trong DD [Agrawal1996], cc t hp c phn chia v phn b trn khp cc b x l theo m hnh vng robin (round-robin).C ba bc tt c.Ti bc 1, mi b x l duyt phn vng d liu cc b nhm thu nhn cc local count ca tt c cc t hp c phn b trn . Bc hai, mi b x l phn pht phn vng d liu ca mnh v nhn cc phn vng d liu n t cc b x l khc. Sau n s duyt cc phn vng d liu nhn c ly global support count trn ton b CSDL. bc cui cng, mi b x l tnh ton cc tp ph bin trong phn vng t hp ca n, ri trao i vi tt c nhn c mi tp ph bin. Tip s tin hnh pht sinh cc t hp, cc phn vng v phn b cc t hp ny n tt c cc b x l. Bc ny c lp li cho n khi khng cn t hp no c sinh ra na. Lu rng hin tng trn giao tip ca vic loan truyn cc phn vng d liu c th c gim bt bng giao tip khng ng b [Agrawal1996].( IDD (Intelligent Data Distribution)IDD l mt thut ton ci tin t DD [Han1997]: phn chia cc t hp x l cho da trn phn t u tin ca cc t hp . iu ny c ngha l cc t hp c cng phn t u tin s c phn chia vo cng mt phn vng. Bi vy mi mt b x l ch cn kim tra cc tp con c phn t u tin tng ng ng vi chng. iu ny lm gim thiu s d tha tnh ton trong DD: thay v mi b x l cn phi kim tra tt c cc tp con ca mi giao tc th ch cn tin hnh kim tra nh trn. s phn tn ca cc t hp cn bng, thut ton s dng k thut nn nh phn (bin-packing technique) nhm phn phi cc t hp. u tin, IDD tnh s t hp bt u bng mt phn t c bit, sau s dng thut ton nn nh phn ch nh cc phn t cho cc vng t hp sao cho s lng cc t hp l nh nhau.Thut ton ny cng s dng kin trc vng lm gim trn giao tip, tc l n dng giao tip im-im khng ng b gia cc phn t cnh nhau trong vng thay cho vic loan truyn gia cc phn t.

( HPA (Hash-based Parallel mining of Association rules)HPA s dng k thut bm phn phi cc t hp vo cc b x l khc nhau [Shintani1996], ngha l mi b x l s dng cng mt hm bm (hash function) tnh ton cc t hp c phn b trn n. Trong qu trnh thng k, thut ton di chuyn tp con cc phn t ca cc giao tc (thay cho vic phi dch chuyn cc phn vng d liu) n b x l ch ca chng bng cng k thut bm. V vy, thay v phi n n b x l, mt tp mc d liu ca mt giao tc ch n mt b x l m thi.HPA c ci tin hn na bng cch dng k thut skew handling [Shintani1996]. y l k thut sao li mt vi t hp nu c b nh trn mi b x l, v vy ti trng lm vic ca mi b x l c cn bng hn.( PAR (Parallel Association Rules)PAR (Zaki1997] gm c tp cc thut ton s dng vic phn chia v thng k t hp khc nhau. Tt c u gi s rng CSDL phn chia dc (lit k tid cho mi phn t - bin nhn dng) tng phn vi phn chia ngang t nhin (lit k cc giao tc).Bng cch t chc chia dc CSDL, vic thng k mt tp phn t n gin ch l tm giao ca cc lit k tid trong tp phn t. Tuy nhin i hi phi c mt s bin i CSDL sang phn tch dc nu CSDL nguyn thy l phn tch ngang. CSDL c th c sao li mt cch chn lc nhm gim bt tnh ng b.Hai thut ton dng ny (Par-Eclat v Par-MaxEclat) s dng lp tng ng da trn phn t u tin ca cc t hp trong khi hai thut ton khc (Par-Clique v Par-MaxClique, cng nm trong lp thut ton PAR) li dng siu th cc i (maximum hypergraph) phn phi cc t hp.3.3 Lp cc thut ton khcMt s thut ton song song khc khng th xp vo hai m hnh trn, mc d c cng tng nhng chng li c nhng c tnh khc bit. Ta xem xt qua cc thut ton sau.( Candidate DistributionThut ton Candidate Distribution (tm dch l phn phi t hp) [Agrawal1996] c gng lm gim s ng b v trn giao tip trong hai thut ton CD v DD. Ti ln duyt l, n chia cc tp ph bin Ll-1 gia cc b x l theo cch m mt b x l c th pht sinh mt tp duy nht ca cc t hp c lp vi cc b x l khc. Ti cng thi im, CSDL c phn chia li v vy mt b x l c th thng k c cc t hp c pht sinh c lp. Ty thuc vo c tnh ca s phn chia t hp m cc phn CSDL c th c ti to li trn mt s b x l.S phn chia cc tp phn t c tip tc bng cch nhm li da trn cc tin t. Sau khi phn chia t hp, mi b x l mt cch c lp ch thng k cc t hp thuc phn chia ca n bng cch ch dng phn vng CSDL cc b. Khng c s giao tip (thng k hoc d liu) no c yu cu. Ngay trc khi phn chia t hp c th dng hoc l thut ton phn chia s m hoc thut ton phn chia d liu hoc thut ton phn chia t hp, nn thut ton ny l mt kiu lai gia hai m hnh trn. ( SHTrong SH [Harada1998], cc t hp khng c pht sinh t cc tp ph bin trc , iu ny dng nh khc vi thut ton tun t Apriori.Thay v cc t hp c to c lp bi mi b x l trong sut qu trnh duyt phn vng CSDL, ti bc lp k, mi b x l pht sinh v thng k cc tp k phn t t cc giao tc c trong phn vng d liu ca chng. Ch mi tp k phn t ca cc tp con (k-1) phn t m ph bin ton cc l c pht sinh. Ti im cui mi bc lp, mi b x l trao i cc tp k phn t v cc local count ca chng, nhn global count ca tt c cc tp k phn t.Tp k phn t ph bin c xc nh v nh (bitmap) ca cc tp ph bin cng c lp trn mi b x l.Trong trng hp vic thng k khng c thc hin cn bng, cc giao tc c chuyn t cc b x l qu ti sang cc b x l khc rnh hn. Trng hp khng b nh, cc tp k phn t hin ti s c lu tr v i trn a, v cc t hp k phn t mi c pht sinh v thng k cho n ht phn vng CSDL. Ti thi im cui ca mi bc lp, cc local count ca mi tp k phn t c kt hp v trao i vi cc b x l khc nhn c cc global count.SH c v nh da trn thut ton khc vi Apriori nhng li rt tng ng vi thut ton ny. u tin, n cng lp li nh trong Apriori, ngha l ch ti im cui ca bc lp th s gia tng cc t hp mi (c kch thc mi) mi c pht sinh. im khc so vi Apriori l SH to cc t hp trong qu trnh lp cn Apriori to ti im cui bc lp. im th hai na l cc t hp c pht sinh bi SH cng chnh xc nh trong Apriori nu CSDL l phn tn u, ch khi CSDL l v cng lch th vic pht sinh cc t hp mi dn n s khc bit.V d, nu A v B khng bao gi xut hin cng nhau (A v B c th vn l ph bin) trn phn chia d liu i, ngha l thng k ca n bng 0, SH s khng pht sinh AB nh mt t hp bc th hai trn b x l i. Nhng nu AB xut hin cng nhau mt ln, AB s c pht sinh nh mt t hp bi thut ton SH. V th ta c th xp SH vo m hnh thut ton Data Parallelism vi vic kim sot lch v khng b nh chnh.( HD (Hybrid Distribution)HD c xut trong [Han1997] kt hp c hai m hnh. Thut ton gi s c p b x l c sp xp trong mng 2 chiu c r hng v p/r ct. CSDL c phn b tng t nhau cho p b x l ny. Tp t hp Ck c phn chia ln cc ct ca mng (ngha l p/n phn chia vi mi ct c mt phn chia ca cc tp t hp) v phn chia ca cc tp t hp trn mi ct c sao li trn tt c cc b x l theo mi hng ca ct. By gi bt k thut ton phn b d liu no u cng c th p dng c lp trn mi ct ca mng v cc global count ca mi tp con ca Ck nhn c bng thao tc gim theo mi hng ca mng nh trong m hnh Data Parallelism.

Kin trc li nh trn c xem nh l m hnh tng qut ca c hai m hnh trn: nu s ct l mt th y l m hnh Task Parallelism; nu s hng l mt: m hnh Data Parallelism. Theo HD, s trn giao tip khi di chuyn CSDL c gim xung, v cc phn vng CSDL ch cn di chuyn theo cc ct thay v trn ton bng.

Thut ton HD cng c th t ng chuyn thnh CD trong ln duyt tip theo nhm lm gim thiu hn na trn giao tip.KT LUN:C hai m hnh Data v Task u c nhng mt mnh v yu ring ca mnh. M hnh Data Parallelism n gin hn trong vic giao tip v v th s t b trn giao tip hn. N ch cn trao i cc local count ca tt c cc t hp ti mi vng lp. Thut ton CD c th c ci tin hn na bng k thut bm (PDM), k thut tch t hp (DMA) v thng k ngn mch (CCPD). Tuy nhin m hnh Data Parallelism yu cu mi t hp phi c np va trong b nh chnh ca mi b x l. Nu trong mt vi vng lp no , s t hp cn np qu ln so vi b nh, mi thut ton da trn m hnh ny u khng lm vic (ngoi tr SH) hoc hiu sut ca chng s gim xung. Ring SH c gng gii quyt hin tng ny bng cch chuyn cc t hp xung a. Mt vn c th xy ra thm i vi SH l c qu nhiu t hp cn phi ghi ln a, nh vy tng s cc local count trong mi qu trnh trn dn n cn nhiu thao tc I/O. Thm na, SH c th xy ra trng hp trn tnh ton khi pht sinh t hp trong khi ang thc hin trc tuyn. V nh khi thut ton cn kim tra mi tp con (k-1) ca tp k phn t trong mi giao tc l ph bin hay khng bng cch tm kim nh ca cc tp phn t ph bin (k-1); trong khi Apriori ch cn kim tra tp cc phn t l kt ni ca hai tp Lk+1.M hnh Task Parallelism ban u c xut nhm tng kh nng s dng b nh chnh ca my tnh song song. N phn chia v phn tn cc t hp gia cc b x l trong mi vng lp v th n dng b nh chnh ca mi b x l v c th khng xy ra trng hp thiu b nh vi s lng b x l ngy cng tng. Bi vy lp thut ton ny c dng tm cc lut kt hp vi ngng minsupp rt thp. Tuy nhin, Task Parallelism yu cu di chuyn cc phn vng d liu trao i.

Thng thng, CSDL c dng tm lut l rt ln, v vy vic di chuyn d liu s lm xy ra li trn giao tip rt khng khip. Nh vy lp thut ton ny gp vn i vi cc CSDL rt ln. Trong thut ton c bn ca m hnh ny, phc tp di chuyn d liu l O(p2) vi p l s lng b x l. Thut ton IDD dng kin trc vng v cc giao tip din ra ng thi trn cc phhn t k nhau v vy phc tp ch cn O(p). HPA dng k thut bm di chuyn trc tip cc phn vng d liu, v vy n ch di chuyn cc giao tc (chnh xc l cc tp con ca cc giao tc) n b x l ch thch hp. Tng t nh cc t hp c phn b bi hm bm, cc tp con ca cc giao tc cng c lu tr bi cng mt hm bm, v vy phc tp ch l O(p).Nhng nghin cu trong [Agrawal1996], [Park1995a], [Cheung1996], [Cheung1998], [Zaki1996]. [Han1997] v [Shintani1996] v hiu sut ca c hai m hnh u ch ra rng chng t l tuyn tnh theo kch thc CSDL v s lng b x l. M hnh Task Parallelism khng t quy m tt nh m hnh Data nhng c th t hiu qu vi ngng minsupp thp. iu ny khng th qun l c vi m hnh Data Parallelism hoc c th rt thp.Mt hng i y trin vng cho tng lai l vic kt hp c hai m hnh trn. Phn tn lai (HD) [Han1997] th scalable hn m hnh Task Parallelism v lm gim bt hin tng thiu b nh chnh. Tt c cc thut ton song song tm lut kt hp u da trn thut ton tun t Apriori. Khi Apriori c ci tin bi nhiu thut ton khc, c bit khi gim s lng CSDL cn duyt, cc thut ton song song ci tin c trng ch em li nhng thnh qu tt p hn.4. So snh cc thut ton

Cc tiu ch so snh:

( Khng gian: c nh gi bng s lng ti a cc t hp c thng k trong sut qu trnh duyt bt k trn CSDL.

( Thi gian: thng k s lng ti a cc ln duyt CSDL (c lng I/O) v s lng ti a cc php ton so snh (c lng CPU).

Do hu ht cc CSDL giao tc u c lu tr trn a th yu v v vic trn thao tc I/O l quan trng hn trn CPU nn trng tm l s ln duyt trn CSDL u vo. Trng hp xu nht xy ra khi mi giao tc trong CSDL u c tt c cc phn t.Gi m l s phn t trong mi giao tc, Lk l tp ph bin gm c k phn t trn CSDL D th s tp ph bin ti a l 2m.Trong cc cc k thut level-wise (nh AIS, SETM, Apriori), mi tp ph bin trong L1 nhn c trong qu trnh duyt CSDL ln u tin. Tng t, mi tp ph bin trong L2 cng c nhn dng trong ln duyt th hai, v c th tip tc. Mi thut ton u dng khi khng c thm phn t no thuc tp ph bin c pht sinh bi vy CSDL u vo cn duyt nhiu nht l (m+1) ln.Apriori-TID duyt CSDL trong ln duyt u tin, sau dng k thay cho CSDL ti bc th (k+1). Tuy nhin iu ny khng gip ch c g trong trng hp xu nht. Nguyn nhn do k s cha mi giao tc cng vi cc phn t ca chng trong sut qu trnh x l u vo. OCD duyt CSDL u vo ch mt ln ti thi im bt u thut ton xc nh tp ph bin trong L1. Sau ny, OCD v Sampling ch dng mt phn ca CSDL u vo v cc thng tin nhn c trong ln duyt u tin dng tm t hp cc tp phn t ca Lk vi 1 k m. Trong ln duyt th hai, cc thut ton ny tnh support cho mi t hp phn t. Bi vy s ch c hai ln duyt trong trng hp xu nht. K thut PARTITION lm gim bt trn I/O bng cch gim s lng CSDL cn duyt xung cn 2. CARMA cng tng t nh vy.S tinh ty ca thut ton da trn tnh chnh xc ca s lng cc t hp ng m n pht hin. Nh cp trn, mi thut ton s dng cc tp ph bin ca mt (hoc nhiu) ln duyt trc nhm pht sinh cc tp t hp. Cc tp t hp ny c a vo b nh chnh xc nh tp t hp cc phn t v tp ny mt ln na c np vo b nh nhn c cc support ca chng. Khi khng b nh, cc thut ton xut nhng phng php qun l b m v cu trc lu tr khc nhau.AIS ngh Lk-1 c th gi trn a, nu cn thit. Theo SETM, nu k qu ln np vo b nh chnh th ghi n xung a vi c ch qun l FIFO. Lp thut ton Apriori khuyn co gi Lk-1 trn a v mang ln b nh tng khi mt ti mt thi im nhm tm Ck. Tuy nhin, k phi nm trong b nh chnh xc nh support c hai thut ton Apriori-TID v Apriori-Hybrid. Tuy vy, mi k thut khc u gi s rng c lng b nh qun l cc vn dng ny. Tt c cc k thut tun t khc (nh PARTITION, Sampling, DIC v CARMA) cn nhc mt phn thch hp no ca d liu np va trong b nh chnh. Lp thut ton Apriori xut cu trc d liu cy bm hoc mng cho cc tp ph bin tuy nhin AIS v SETM li khng a ra hng gii quyt c th no. Cu trc d liu ngh cho tng thut ton c th c trnh by trong bng 6.V phng din thng mi, k thut Apriori l c tin dng hn c khai ph cc lut kt hp.Mt s thut ton s ph hp hn khi t c nhng iu kin c th. AIS khng t c hiu qu tt khi khi s lng phn t trong CSDL ln bi vy ph hp vi nhng CSDL giao tc c s phn t trong t hp nh. Apriori tn t thi gian thc hin hn so vi Apriori-TID nhng ln duyt u, nhng Apriori-TID li lm tt hn Apriori nhng ln duyt v sau. V vy thut ton Apriori-Hybrid l s la chn tt v n c th chuyn qua Apriori-TID t thut ton Apriori. Tuy nhin, cn xc nh khi no v tr gi nh th no khi chuyn t Apriori sang Apriori-TID. Mc d OCD s dng k thut gn ng nhng n rt hiu qu tm cc tp ph bin vi ngng support thp. CARMA l hng k thut tng tc ngi dng trc tuyn c s hi p thch hp nht khi chui giao tc c nhn t mng my tnh.Bng 4: So snh cc thut ton trnh by

Thut tonDuytCu trc d liuCh thch

AISm+1khng xc nhThch hp cho CSDL c s yu t trong t hp thp; V th hai ch n thuc tnh.

SETMm+1khng xc nhtng thch vi SQL

Apriorim+1Lk-1: bng bm

Ck: cy bmCSDL giao tc c s yu t trong t hp va phi;

Thc hin tt hn AIS v SETM; Thut ton c bn cho cc thut ton song song.

Apriori-TIDm+1Lk-1: bng bm

Ck: mng sp xp theo TID

k :cu trc tun t

ID: bitmapRt chm vi s lng ln ca k Thc hin tt hn Apriori vi s lng k nh.

Apriori-Hybridm+1Lk-1: bng bm

Pha th nht:

Ck: cy bm

Pha th hai:

Ck: mng sp theo TID

k : cu trc tun t

ID: bitmapTt hn Apriori. Tuy nhin, rt tn chuyn t Apriori sang Apriori-TID

OCD2khng xc nhThch hp trong CSDL ln vi ngng support thp.

Partition2Bng bmThch hp cho CSDL ln vi s yu t trong t hp ln.

ng b d liu phn tn.

Sampling2khng xc nhc p dng cho CSDL rt ln vi support thp.

DICTu thuc vo kch thc ca intervalCy tin t (prefix-tree)CSDL c xem nh l cc interval ca cc giao tc;

S tng trng cc t hp pht sinh ti im cui ca mt interval.

CARMA2Bng bmC th dng c i vi cc giao tc tun t c c t Network; Trc truyn, ngi dng nhn c thng tin phn hi lin tc v thay i support v/hoc confidence bt k thi im no trong sut qu trnh x l.

CDm+1bng bm v cyData Parallelism

PDMm+1bng bm v cyData Parallelism vi vic tch sm t hp.

DMAm+1bng bm v cyData Parallelism vi vic tch t hp.

CCPDm+1bng bm v cyData Parallelism; trn b nh dng chung.

DDm+1bng bm v cyTask Parallelism; phn chia vng robin.

IDDm+1bng bm v cyTask Parallelism; phn chia theo phn t u tin

HPAm+1bng bm v cyTask Parallelism; phn chia bi hm bm

SHm+1bng bm v cyData Parallelism; cc t hp c pht sinh c lp bi mi b x l.

HDm+1bng bm v cyLai gia Data v Task Parallelism; kin trc li song song

Bng 4 tng hp v cung cp mt s so snh ngn gn cc thut ton c trnh by trong kha lun ny. Ngoi tr thut ton Apriori s dng trong chng trnh demo nn c trnh by chi tit, hu ht cc thut ton khc u ch nu tm tt k thut.

TNG KTBi ton tm lut kt hp l mt lp bi ton kh v c phc tp cao. Vic nhn xt nh gi hiu qu ca mi thut ton cn phi c thi gian v nhng thc nhim c th.

Trong phn kt lun ny, chng em cha tng kt li ton b cc ni dung ca kha lun cng nh gii thiu hng pht trin chi tit cho cc thut ton m mi ch mang tnh cht lit k cc phn c trnh by.[1] Tng quan v Data MiningPhn ny trnh by ngn gn cc l thuyt c bn nht v khai m d liu nhm to mc ch xc nh hng i cho phn tip theo.[2] Cc thut ton tm lut kt hp

Tm hiu cc thut ton tm lut kt hp, nhng u im v hn ch ca chng. Cc thut ton c trnh by hu ht l lp thut ton tm kim theo chiu rng BFS (Breadth First Search nh cc thut gii theo nguyn l Apriori) v so snh chng da trn mt s tiu ch nht nh.

Cc thut ton ny c trnh by ngn gn, ch yu l gii thch nguyn l hot ng.

Trong tng lai, hng m rng ca kho lun l ci t tt c cc thut ton nhm a ra nhng so snh chnh xc hn v hiu nng ca tng nhm thut ton.

B BAI TP H CHUYN GIA

XY DNG H CHUYN GIA H H TR CHN ON MT S BNH THNG THNG TR EMI. Xy dng c s tri thc cho bai toan

Trong bai tp ln ny s dng c s tri thc ch yu thng qua vic phn tch cc ti liu: Nhng con ng chn on trong y hc lm sng v Hng dn x tr lng ghp cc bnh thng gp tr em.

Da vo cc phc v cc bnh, ta xy dng cc lut c dng:

Lut i: Nu < triu chng1, triu chng2 ...tha>

( i=) Th

di y l nhm lut ca cc bnh lin quan n bnh St tr em (ng vi phc v cc bnh lin quan n bnh st tr em Hnh 4)

Lut 1:

Nu

ban si v kh th

Th Tr b si km vim phi

Lut 2: Nu

ban si v ting ran

Th

Tr b si km vim phi

Lut 3: Nu

ban si

Th

Tr b bnh si

Lut 4: Nu kh th , ting ran, khng a chy

Th

Tr b vim phi

Lut 5: Nu

thp phng

Th

Tr b vim mng no

Lut 6: Nu

cng gy tr ln hn

Th

Tr b vim mng no

Lut 7: Nu

apxe, c, nng v au

Th Tr b nhim khun

Lut 8: Nu

apxe, c

Th

Tr b nghi lao

Lut 9: Nu

apxe

Th Tr b nhim khun

Lut 10: Nu

m amyan

Th Tr b vim Amyan cp

Lut 11: Nu

tai xut tit, au

Th Tr b vim tai gia

Lut 12:Nu

i but, au tht lng

Th Tr b vim m thn cp

Lut 13: Nu ph mt

Th Tr b vim thn cp

II. Cai t chng trinh DEMO

Vi mc ch ban u t ra nn trong ti ny ch gii hn mt s bnh thng gp tr em vi tp lut c chia thnh cc lp con. Lp cc lut xc nh cc chng bnh lin quan n bnh st tr em, ho tr em, ...

Chng trnh c vit bng ngn ng SWI-Prolog, y l ngn ng lp trnh logic c c ch suy din thng c s dng trong lnh vc Tr tu nhn to (trong c h chuyn gia).

chy chng trnh ny, cn h iu hnh Windows 98 tr ln v phi ci t phn mm SWI-Prolog. chy chng trnh th m file c tn menufile.pl.Sau khi chy chng trnh, xut hin giao din chnh ca chng trnh nh sau:

Da vo kin thc v y hc, chng trnh chia thnh 2 mc chnh: Mc dnh cho cc b m v mc dnh cho cc bc s tuyn c s.

Mi mc u c 3 modul chnh sau:

- Module la chn nhm bnh: Ngi s dng s chn mt nhm trong cc nhm bnh trong form giao din (hnh 4), nhn vo nt la chn s xut hin ca s cha cc cu hi v nhng triu chng lin quan n nhm bnh ny.

Giao din la chn cc loi bnh

- Module hi p: Sau khi bnh nhn la chn mt nhm bnh, chng trnh s a ra cc cu hi v cc triu chng lin quan n nhm bnh ny (hnh 5) (c thit k theo c ch hi p).

Giao din hi p

- Module kt lun: Sau khi tr li (C hoc khng) i vi cc triu chng a ra, chng trnh s x l (theo c ch suy din) v a ra cu tr li tng ng form kt lun (hnh 6)

Giao din kt lun

IV. KT QUA DEMOTrong chng trnh ny chng ti kt hp kin thc t cc ti liu chn on nhi khoa xy dng v gim nh tin cy ca chng trnh. Tuy nhin, chng trnh vn cn mt s thiu st: Giao din cha c thn thin i vi ngi s dng cn phi a thm hnh nh v m thanh minh ho. Cha c phn h tr vic b sung v thay i tri thc. Cha dch c file chng trnh thnh file chy ngi s dng c th dng c lp vi phn mm SWI-Prolog.TI LIU THAM KHO[1] Bi ging mn hc C s tr thc v ng dng GS-TSKH Hong Kim, H QG tp HCM.

[2] Introduction to knowledge discovery and data mining Dr. Ho Tu Bao, Institute of Information Technology VietNam.

[3] Association Rule Mining: A survey Qiankun Zhao and Sourav S. Bhowmick, Nanyang Technological University, Singapor.

[4] A survey of association rules Margaret H. Dunham :: Yongqiao Xiao, Department of Computer Science and Engineering, Southern Methodist University and Le Gruenwald :: Zahid Hossain, Department of Computer Science, University of Oklahoma.

[5] Fast Algorithms for Mining Association Rules - Rakesh Agrawal :: Ramakrishnan Srikant, IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120.

[6] Survey oh Frequent Pattern Mining - Bart Goethals, HIIT Basic Research Unit, Department of Computer Science, University of Helsinki Findland

[7] Algorithms for Association Rules - Markus Hegland, Australian National University, Canberra ACT 0200, Australia.

[8] Data Mining of Association Rules and the Process of Knowledge Discovery in Databases - Jochen Hipp, Ulrich Gntzer, and Gholamreza Nakhaeizadeh.[9] H chuyn gia, Trung Tun, nh xut bn Gio Dc, 1999.

[10] Hng dn x tr lng ghp cc bnh thng gp tr em[11] Lp trnh Turbo Prolog 2.0- Phan Trng Dn - Nh xut bn khoa hc v k thut, 1998.

[12] Lp trnh logic trong Prolog - NXB i hc quc gia H Ni 2004

[13] Nhng con ng chn on y hc - B.J.ESSEX, nh xut bn o to y hc, HN, 1992

[14]. Web site http://www.swi-prolog.org

Hnh 3: M hnh Task Parallelism

Hnh 2: M hnh Data Parallelism

T1 = Bnh m, B, Trng.

T2 = B, Trng, Sa.

T3 = B.

T4 = Bnh m, B.

L1={{Bnh m}.{B},{Trng}.{Bnh m, B},

{Bnh m, Trng},{B, Trng},{Bnh m, B, Trng}, {Sa},{B, Sa}, {Trng, Sa},{B, Trng, Sa}}

L2={{B}, {Bnh m}, {B}, {Bnh m, B}}

Duyt D1 v D2 tm cc Local Large Itemset.

C ={{Bnh m}.{B},{Trng}. {Bnh m, B},

{Bnh m, Trng},{B, Trng},{Bnh m, B, Trng}, {Sa},{B, Sa}, {Trng, Sa},{B, Trng, Sa}}

L ={{Bnh m}, {B},{Trng}, {Bnh m, B},{B, Trng}}

Duyt D thng k support

Hnh 1: Tm cc tp ph bin dng thut ton PARTITION

Bnh m, B1Bnh m, Trng1B, Trng1

T1

Processor 1

D1

C2 Count

C2 Count

D2

Processor 2

T2

Bnh m, B0Bnh m, Trng0B, Trng1

C2 Count

D3

Processor 3

T3

T4

Bnh m, B1Bnh m, Trng0B, Trng0

Global Redution

Itemset broadcast

C23 Count

D3

Processor 3

T3

T4

B, Trng

2

C22 Count

D2

Processor 2

T2

Bnh m, Trng

1

C21 Count

D1

Processor 1

T1

Bnh m, B

2

Database broadcast

Hnh 4. phc v cc bnh lin quan n bnh st tr em

N

N

N

Y

Y

Y

Y

Y

N

N

N

N

N

N

N

Y

Y

Y

Y

N

Y

N

Y

Y

Xem phc nhng triu chng ny

Ph mt hoc mt

i but hoc au tht lng

Vng da hoc khp au v sng hoc au bng d di hoc a chy

Tai xut tit hoc au

St rt hoc nhim virut

Vim thn cp

Vim m thn cp

Vim tai gia

Vim amydan cp

M Amydan

c

Nng v au

Nhim khun

Nghi lao

Nhim khun

Vim mng no

Kh th hoc ting ran

Si km vim phi

Vim phi

Ban si

p xe

Thp phng hoc cng gy tr ln hn

Ting ran hoc kh th, khng a chy

Ban si

Thi Th Bch Thy Nguyn Th Kim Ngn Nguyn Th Dim Thy

_1173177492.unknown

_1173245150.unknown

_1173245842.unknown

_1173245919.unknown

_1173177532.unknown

_1173158316.unknown

_1082536003.unknown