If you can't read please download the document
Upload
zahur
View
63
Download
3
Embed Size (px)
DESCRIPTION
第七章 網路資料庫之關連法則探勘. 內容概要. 簡介 關連法則探勘 (Association Rule Mining) 多層次關連法則探勘 (Multilevel Association Rule Mining) 數量化關連法則探勘 (Quantitative Association Rule Mining) 關連分析 (Correlation Analysis) 總結. 簡介 (1). 單一購物車告訴我們個別顧客的消費行為,但是累積大量的購物車資料之後,可以分析整體顧客的消費習慣。 - PowerPoint PPT Presentation
Citation preview
(Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)
(1)IBM PC ViewSonic
(2)80%
(Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)
7-1
(1) (itemset)XXTT (support) X
(2)X (support count) XX (support) X7-1 2 5 5/10=0.5{2,5} 3 3/10=0.3 X Y [,] X Y X Y X Y
(3)X Y (confidence)
(4) (minimum support) (minimum confidence) (minimum support count)7-10.20.5 {1,3} {5}20.2{1,3}0.3 {1,3} {5}0.2/0.3=0.67
(5) (large itemset)Z XY
(6)7-10.20.7{1,3}{1}{3}{3}{1}{1}{3}0.3/0.4=0.75{3}{1}0.3/0.5=0.6{1}{3}
Apriori k k- (k-itemset)Lkk- (large k-itemset) Apriori1- L1L1 L2L2L3
Apriori Apriori Apriori {A,B}{A,B}{A}{B}{A}{A,B}{A}{B}
Apriori Apriori (candidate itemsets) (join) (prune)
Apriori k-CkLkLk XCkApriori XX1 Apriori CkXk-1(k-1)-X k-XCk
7-2X1X23-X1={1,3,5}X2={1,3,6}X1X24-{1,3,5,6}Apriori 4-{1,3,5,6}{1,3,5,6}3{1,3,5}{1,3,6}{1,5,6}{3,5,6}{1,3,5}{1,3,6}3-{1,5,6}{3,5,6}3-{1,3,5,6}4-{1,5,6}{3,5,6}3-{1,3,5,6}4-
Apriori 1 L1 = 1-;2 for (k = 2; Lk-1; k++) do begin3 Ck = Candidate_gen (Lk-1) 4 for each t 5 Ckctc c1 6 Lk = Ck 7 end8 return L =
7-3 (1)7-1Apriori3 1- C1
7-3 (2) L1
7-3 (3)L1C2
7-3 (4)2-
7-3 (5) L2
7-3 (6)L2C3
3-
7-3 (7){{1},{2},{3},{4},{5},{6},{1,3},{1,5},{2,5},{3,5}}0.7 {1} {3} =3/4=0.75 {3} {1} =3/5=0.6 {1} {5} =3/4=0.75 {5} {1} =3/6=0.5 {2} {5} =3/5=0.6 {5} {2} =3/6=0.5 {3} {5} =3/5=0.6 {5} {3} =3/6=0.5 13
(Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)
80%PC70%IBM PCViewSonic (lower concept level)
7-5 IBM COMPAQ ASUS HP IBM Acer IBM Acer Toshiba
7-14 CRT LCD 17 19 15 17
7-15 A4 A3+
(lower) (higher) HP ViewSonic=0.01 =0.95
ViewSonic=0.7=0.9 (multilevel association rules)
(top-down) 1 (level-1) 2 (level-2) Apriori
(1)ixxi-1x
(2)ix i-1 1-x [=0.2]
()
()
= 0.25
(1) [=0.2]
[=0.12]
[=0.08]
= 0.3
= 0.06
(2)k-ik-X i-1 k-k-X {,LCD}[ = 0.2]
{,15LCD}[= 0.12]
{,17LCD}[= 0.02]
{,15LCD}[= 0.03]
{,17LCD}[= 0.03]
= 0.15 = 0.03
IBM 1122 1 1 1 2 2 3 2 4 IBM
7-2
7-3
7-4T[1]
TT[1]L[j,k]jk-LL[j]jminsup[j]j
7-4 1 2 3 4 5 T[1](7-4)7-3 1600 7-2IBM 1111
(1)1for (j=1L[j,1] and jj++) do begin /* 1 */2 if j=1 then {3 L[j,1] = Large_item_gen(T[1],j) /* T[1]1 1- */4 T[2] = Filtered_table(T[1],L[1,1]) /* L[1,1]T[1] */5 }6 else L[j,1] = Large_item_gen(T[2],j)
(2)7 for (k = 2;L[j,k-1]; k++) do begin /* j k- */8 Ck = Candidate_gen(L[j,k-1])9 for each T[2]t 10 Ckctcc 111 L[j,k] = Ckminsup[j]k- 12 end13 return LL[j] = j 14end
(3)3Large_item_gen (T[1],j) T[1]j1-L[j,1]1Large_item_gen(T[1],1) 1-L[1,1]6j(j>1)Large_item_gen(T[2],j) 1-L[j,1]L[j-1,1]L[j,1]2 11** 1-3 (111*) (112*)
(4)4Filtered_table(T[1],L[1,1]) L[1,1] T[1] ttttT[1]Filtered _table(T[1],L[1,1]) T[2]
7-57-4114T[1]11-L[1,1]{4***} 235 8 4 * Filtered_table L[1,1]T[1]T[2]T[1] 2 3214 4 9 10
7-6 (1)7-511-L[1,1]
7-6 (2)T[1]T[2]
7-6 (3)Candidate_gen12-14L[1,1]C2={{1***,2***}, {1***,4***}, {2***,4***}}{2***,4***}3L[1,2]={{1***,2***},{1***,4***}}L[1,2]C3={{1***,2***,4***}}{1***,2***,4***}3L[1,3]=
7-6 (4)12-L[1,2]
7-6 (5)222T[2]21-L[2,1]{41**} 2 3 8321-Candidate_genL[2,1]C2 = {{11**,12**}, {11**,21**}, {11**,22**}, {11**,41**}, {12**,21**}, {12**,22**}, {12**,41**}, {21**,22**}, {21**,41**}, {22**,41**}}2L[2,2] = {{11**,41**},{12**,21**},{12**,22**}}L[2,2]C3={{12**,21**,22**}}{12**,21**,22**}0L[2,3]=
7-6 (6)2 L[2,1] L[2,2]
7-6 (7)333T[2]31-L[3,1]Candidate_genL[3,1]C2={{111*,121*},{111*,211*},{111*,411*},{121*,211*},{121*,411*},{211*,411*}}3L[3,2] = {{121*,211*}}L[3,2]3-L[3,3]=
7-6 (8)3L[3,1] L[3,2]
7-6 (9)442T[2]41-L[4,1]Candidate_genL[4,1]C22L[4,2] ={{1212,2112}}L[4,2]3-L[4,3]=
7-6 (10)4 L[4,1] L[4,2]
7-7 (1)7-612340.80.70.70.6 1{1***} {2***} =7/8=0.875{2***} {1***} =7/7=1{1***} {4***} =4/8=0.5{4***} {1***} =4/4=1 124
7-7 (2) 2{11**} {41**} =2/3=0.67{41**} {11**} =2/3=0.67{12**} {21**} =3/5=0.6{21**} {12**} =3/4=0.75{12**} {22**} =2/5=0.4{22**} {12**} =2/3=0.67 4
7-7 (3) 3{121*} {211*} =3/3=1{211*} {121*} =3/4=0.75 12 4{1212} {2112} =2/2=1{2112} {1212} =2/3=0.67 12
7-7 (4) 1 (7-5) 2 (7-14) 4 (7-15) =0.875 =1 =1 CRT =0.75 17CRT = 1 17CRT = 0.75 IBM 17CRT = 1 17CRT IBM = 0.67
(Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)
(1) 40% (quantitative association rule)
(2) (intervals)
q_ (q_item) q_ i qq_ q_ (q_itemset) q_ x q_x
q_ q_ q_
(1)i q_ , , ... , , ... q_
(2) T s 1 2 3 4
7-8{,,,,}iq_q_5030100204050010%[1][2..3][4..5]123q_ ( )
(1)Xq_Xq_ttq_Xq_X q_X q_q_ (large q_itemset)kq_k-q_ (k-q_itemset)
(2) X Y [, ] X Y q_ Z q_XY
()q_(LqiTid(large q_itemset generation using Tids))
7-6DB
7-6DB37-17DBDB7-7
7-17 ABCDEFG
7-7DB
q_(1)TS({x}) q_x (Tids) DBTS ({}) = {5,12,14} TS ({}) = {1,4,5,8}TS ({x1,x2}) q_x1x2TS ({x1}) TS ({x2}) TS ({x1,x2}) = TS ({x1}) TS ({x2}) TS ({,}) = TS ({}) TS ({}) ={5}
q_(2) x1,x2,...,xk q_TS ({x1,x2,...,xk}) q_{x1,x2,...,xk}SP ({x1,x2,...,xk}) TS ({x1,x2,...,xk}) : SP ({x1,x2,...,xk}) = Card (TS ({x1,x2,...,xk})) = Card (TS ({x1}) TS ({x2}) TS ({xk})) Card(S) S
7-8q_ 7-7DBq_
q_(3)LqiTidq_SP({x1,x2,...,xk}) {x1,x2,...,xk} k-q_ q_{x1,x2,...,xk} k-q_q_ Candidate_gen(k-1)-q_k-q_ (candidate k-q_itemset)k-q_
LqiTid LqiTid :q_TSSP1-q_q_SP1-q_k-q_k-q_CkTSSPk-q_
LqiTid 1 q_x TS({x}) SP({x}) /* */2L1={x | x q_ SP({x}) } /* 1-q_ */ 3for (k=2; |Lk-1| > 1; k++) do begin /* k-q_ */4 Lk-1k-q_Ck 5 for each q_c Ck do begin /* c (k-1)-q_ S1 S2 */ 6 TS(c)=TS(S1)TS(S2) SP(c)=Card(TS(c)) 7 If SP(c) then 8 Lk = Lk {c} 9 end 10end
7-1027-81-q_7-9 2-q_7-9q_C22-q_7-102-q_ 3-q_7-10q_C33-q_7-113-q_C4=L4=
7-91-q_
7-102-q_
7-113-q_
7-117-100.657-17
{} {} =2/3=0.67 {} {} =3/3=1 {} {C,[1..2]} =2/3=0.67 {} {} =2/2=1 {,} {} =2/3=0.67 {,} {} =2/2=1 {,} {} =2/2=1 {,} {} =2/2=1 {,} {} =2/2=1 {,} {} =2/3=0.67
(Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)
(1) 10000 60007500400030%60% [=40%, =67%] 75%67%
(2)P(AB) = P(A) P(B)AB (independent)AB (dependent and correlated)AB (correlation)
(3)correlation < 1 A B (negatively correlated) A B correlation > 1 A B (positively correlated) A B correlation = 1 A B 1 1
(Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)
Apriori (hash) (cache)