Upload
gavrilla-rios
View
51
Download
0
Embed Size (px)
DESCRIPTION
Data Mining & Knowledge Discovery. Mining association rules procedure to support on-line recommendation by customers and products fragmentation. S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335. 組員: M964020025 郭李哲 M964020027 鄭淵太 - PowerPoint PPT Presentation
Citation preview
Mining association rules procedure to support on-line recommendation by customers and products fragmentation
S. Wesley Changchien, Tzu-Chuen LuExpert Systems with Applications 20(2001) 325-335
Data Mining & Knowledge Discovery
組員: M964020025 郭李哲 M964020027 鄭淵太 M964020044 鐘佶修
Background and motivation Most of the EC business endeavor to survive
and become leaders in the frontier of the new wave.
The major key factors of success include learning customers’ behavior of purchasing, developing marketing strategies to create new consuming market, and discover latent loyal customers, etc.
2
Data mining task
3
Self-Organizing Map ( SOM)
4
Rough set theory ( RST) 以 RST 進行資料分析全賴兩個基本觀念,稱之為集合
的下界與上界近似( the lower and the upper approximations of a set )
5
Rough set theory ( RST)
6
1}... {* XxIUxXI :
2}... {* XxIUxXI :
1,0xCF
xINum
xIXNumxCF
,
}1 {* xCFUxXI :
}0CF {* xUxXI :
Data mining 程序
Step 1- selection and sampling 1. Creating a fact table 2. Selecting dimensions
挑選其所感興趣的 dimension 3. Selecting attributes
根據重要性,挑選屬性 4. Filtering data
限制屬性值的範圍
Step 2 - transformation and normalization 1. 屬性為數值資料
2. 屬性為非數值資料 將資料做設計描述 Job 屬性中的資料當作 character
Step 3 – data mining of association rules 採用 neural network 進行 clustering 與 rough set
theory 取得規則,以應用於找尋 association rules ,解釋每個 cluster 中其特性,和不同的 cluster 間屬性的關係。
Clustering module Kohonen proposed SOM in 1980. 顯示 input 屬性之間的 natural relationship 。 We can group enterprise’s customers,
products, and suppliers into clusters. For instance, input nodes : education and job
from the table member Output nodes : nine clusters.
Rule extraction module 使用 Rough set theory 對資料記錄中同質的
cluster 找出 association rules 與不同 cluster 間其屬性間關係。
Characterization of each cluster 利用 Rough Set Theory 來解釋一個 cluster 所擁有的
特徵 Ex: 某類的顧客,其教育程度在大學以上、月薪 3.5 萬以上…
1)產生 Result equivalence classes, Xk
2)產生 Cause equivalence classes, Aij
3)產生 Lower approximation rules4)產生 Upper approximation rules5)產生 Combinatorial rules6)解釋 cluster 的特徵7)重複(返回 Step3 )
14
step1.產生 Result equivalence classes, Xk
針對每個 cluster 產生 result equivalence class
Member ID Education(E) Job(J) GIDMember1 N N BMember2 N H AMember3 N H BMember4 L L CMember5 H H AMember6 H H AMember7 L N C
K GID Xk
1 A { Member2, Member5, Member6}2 B { Member1, Member3}3 C { Member4, Member7}
15
step2.產生 Cause equivalence classes, Aij
針對屬性產生 Cause equivalence class
Aij
i=E j=N AEN={ Member1, Member2, Member3}i=E j=H AEH={ Member5, Member6}i=E j=L AEL={ Member4, Member7}i=J j=N AJN={ Member1, Member7}i=J j=H AJH={ Member2, Member3, Member5, Member6}i=J j=L AJL={ Member4}
Member ID Education(E) Job(J) GIDMember1 N N BMember2 N H AMember3 N H BMember4 L L CMember5 H H AMember6 H H AMember7 L N C
16
step3.產生 Lower approximation rules , Confidence = 1 X1 = { Member2, Member5, Member6}
= { Member2} = { Member5, Member6} = {Φ} = {Φ} = { Member2 、 Member5, Member6} = {Φ}
Rule1: If Education = H then GID = A Confidence = 1
kXijij AA *
1
*XENA
1
*XEHA
1
*XELA
1
*XEHA
1
*XJHA
1
*XJLA
Aij
AEN={ Member1, Member2, Member3}AEH={ Member5, Member6}AEL={ Member4, Member7}AJN={ Member1, Member7}AJH={ Member2, Member3, Member5, Member6}AJL={ Member4}
17
step4.產生 Upper approximation rules , , Confidence = X1 = { Member2, Member5, Member6}
= { Member2} = { Member5, Member6} = {Φ} = {Φ} = { Member2 、 Member5, Member6} = {Φ}
1
*XENA
1
*XEHA
1
*XELA
1
*XEHA
1
*XJHA
1
*XJLA
Aij
AEN={ Member1, Member2, Member3}AEH={ Member5, Member6}AEL={ Member4, Member7}AJN={ Member1, Member7}AJH={ Member2, Member3, Member5, Member6}AJL={ Member4}
kXijij AA *
ij
Xij
A
A k
*
18
step4.產生 Upper approximation rules Confidence Threshold = 0.75 Rule2: If Education = N then GID = A
Confidence=
Reject Rule2 (0.33≦0.75)
Rule3: If Job = H then GID = A Confidence=
Accept Rule3(0.75 ≦0.75)
3
1
},,{
}{321
2*
1
MemberMemberMember
Member
A
A
EN
XEN
4
3
},,,{
},,{6532
652*
1
MemberMemberMemberMember
MemberMemberMember
A
A
JN
XJN
19
step5.產生 Combinatorial rules Confidence =
結合 Rule ,產生考量多個屬性的關聯規則
kjijiij XAAA )( ''''''
),)(
,)(
,)(
min(''''
''''''
''
''''''''''''
ji
kjijiij
ji
kjijiij
ij
kjijiij
A
XAAA
A
XAAA
A
XAAA
20
X1 = { Member2, Member5, Member6}
step5.產生 Combinatorial rules Rule4: If Education = N and Job = H then GID
= A Confidence=
Rule5: If Education = H and Job = H then GID = A Confidence =
4
1)
4
1,
3
1min(
)},,,{
}{,
},,{
}{min(
),min(
6532
2
321
2
1,1,
MemberMemberMemberMember
Member
MemberMemberMember
Member
A
XA
A
XA
JH
JHEN
EN
JHEN
2
1)
4
2,
2
2min(
)},,,{
},{,
},{
},{min(
),min(
6532
65
65
65
1,1,
MemberMemberMemberMember
MemberMember
MemberMember
MemberMember
A
XA
A
XA
JH
JHEH
EH
JHEH
21
step6.解釋 cluster的特徵 將規則匯總並解釋其特徵 屬於 Cluster 1(Cluster A) 的 Member :
100% 的人 Education = High 75% 的人 Job = High 25% 的人 Education = Normal 且 Job = High 50% 的人 Education = High 且 Job = High
Education = H(Confidence=1)
Cluster A
Education = N & Job = H(Confidence=0.25)
Job = H(Confidence=0.75)
Education = H & Job = H(Confidence=0.5)
22
step7.重複 返回 Step3 ,計算下一個 equivalence class Xk ,
以此方式重複進行直到所有的 equivalence class 皆計算完成。
23
Association of different clusters 利用 Rough Set Theory 分析不同 cluster 之間的關
係 Ex: A 類的會員較喜歡 b 類的商品; C 類的會員較喜歡 d
類的商品…
Order ID Buyer ID Receiver ID Product IDOID1 Member3 Member5 Product10
OID2 Member3 Member2 Product25
OID3 Member1 Member6 Product13
OID4 Member2 Member4 Product24
OID5 Member2 Member1 Product15
OID6 Member2 Member2 Product26
Member ID attribute1 attribute2 … GID(Cluster)Member1 … 2Member2 … 2Member3 … 1Member4 … 3Member5 … 3Member6 … 2
Order ID Buyer(GID) Receiver(GID) Product(GID)OID1 1 3 7OID2 1 2 3OID3 2 2 6OID4 2 3 2OID5 2 2 7OID6 2 2 6
24
Association of different clusters
Aij
Buyer 1 { OID1, OID2}Buyer 2 { OID3, OID4, OID5, OID6}
Order ID Buyer(GID) Receiver(GID) Product(GID)OID1 1 3 7OID2 1 2 3OID3 2 2 6OID4 2 3 2OID5 2 2 7OID6 2 2 6
Aij
Receiver 2 { OID2, OID3, OID5, OID6}Receiver 3 { OID1, OID4}
Aij
Product 2 { OID4}Product 3 { OID2 }Product 6 { OID3, OID6}Product 7 { OID1, OID5 }
Aij
Buyer 1 { OID1, OID2}Buyer 2 { OID3, OID4, OID5, OID6}
Aij
Product 2 { OID4}Product 3 { OID2 }Product 6 { OID3, OID6}Product 7 { OID1, OID5 }
R1: If Product = 3 Then Receiver = 2, Confidence = 1R2: If Product = 6 Then Receiver = 2, Confidence = 1
R3: If Buyer = 1 Then Receiver = 2, Confidence = 0.5R4: If Buyer = 2 Then Receiver = 2, Confidence = 0.75
R5: If Product = 7 Then Receiver = 2, Confidence = 0.5
25
系統實做 以某家商店的交易紀錄為對象
Product Table 有 1120 筆記錄 Customer Table 有 35 筆紀錄
保留 2000 筆交易紀錄作為探勘的資料 經由維度、屬性的挑選
Customer Clustering education 、 job 、 gender
Product Clustering sales price 、 import price 、 sale price of VIP
customers
26
SOM network Interface
27
Relationship analysis Interface
28
Use Rules for Recommendation 某一位顧客想購買一個商品贈送其朋友,但他不知該
買什麼較適合。
顧客的 cluster = 7 ,而其朋友的 cluster = 1 ,則系統可推薦 cluster = 9 之商品給顧客
29
Conclusion 本篇採用 SOM 與 rough set theory 進行群集與規
則粹取 Rule extraction module 描述了不同群集間之關係特性分析者可進一步選擇其他屬性,以分析出群集間的關係,例如,星座、心理測驗或血型等。
本研究利用 Rough Set Theory 找出資料中的關聯規則,而關聯規則又可分為兩個方向: cluster 的特徵敘述和不同 cluster 間關係;然而在實作中,只有呈現不同 cluster 間之關係,並沒有提到 cluster 的特徵敘述和該如何應用。