30
Mining association rules procedure to support on-line recommendation by customers and products fragmentation S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335 Data Mining & Knowledge Discovery 組組M964020025 組組組 M964020027 組組 M964020044

S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Embed Size (px)

DESCRIPTION

Data Mining & Knowledge Discovery. Mining association rules procedure to support on-line recommendation by customers and products fragmentation. S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335. 組員: M964020025 郭李哲 M964020027 鄭淵太 - PowerPoint PPT Presentation

Citation preview

Page 1: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Mining association rules procedure to support on-line recommendation by customers and products fragmentation

S. Wesley Changchien, Tzu-Chuen LuExpert Systems with Applications 20(2001) 325-335

Data Mining & Knowledge Discovery

組員: M964020025 郭李哲 M964020027 鄭淵太 M964020044 鐘佶修

Page 2: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Background and motivation Most of the EC business endeavor to survive

and become leaders in the frontier of the new wave.

The major key factors of success include learning customers’ behavior of purchasing, developing marketing strategies to create new consuming market, and discover latent loyal customers, etc.

2

Page 3: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Data mining task

3

Page 4: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Self-Organizing Map ( SOM)

4

Page 5: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Rough set theory ( RST) 以 RST 進行資料分析全賴兩個基本觀念,稱之為集合

的下界與上界近似( the lower and the upper approximations of a set )

5

Page 6: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Rough set theory ( RST)

6

1}... {* XxIUxXI :

2}... {* XxIUxXI :

1,0xCF

xINum

xIXNumxCF

,

}1 {* xCFUxXI :

}0CF {* xUxXI :

Page 7: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Data mining 程序

Page 8: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Step 1- selection and sampling 1. Creating a fact table 2. Selecting dimensions

挑選其所感興趣的 dimension 3. Selecting attributes

根據重要性,挑選屬性 4. Filtering data

限制屬性值的範圍

Page 9: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335
Page 10: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Step 2 - transformation and normalization 1. 屬性為數值資料

2. 屬性為非數值資料 將資料做設計描述 Job 屬性中的資料當作 character

Page 11: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Step 3 – data mining of association rules 採用 neural network 進行 clustering 與 rough set

theory 取得規則,以應用於找尋 association rules ,解釋每個 cluster 中其特性,和不同的 cluster 間屬性的關係。

Page 12: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Clustering module Kohonen proposed SOM in 1980. 顯示 input 屬性之間的 natural relationship 。 We can group enterprise’s customers,

products, and suppliers into clusters. For instance, input nodes : education and job

from the table member Output nodes : nine clusters.

Page 13: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Rule extraction module 使用 Rough set theory 對資料記錄中同質的

cluster 找出 association rules 與不同 cluster 間其屬性間關係。

Page 14: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Characterization of each cluster 利用 Rough Set Theory 來解釋一個 cluster 所擁有的

特徵 Ex: 某類的顧客,其教育程度在大學以上、月薪 3.5 萬以上…

1)產生 Result equivalence classes, Xk

2)產生 Cause equivalence classes, Aij

3)產生 Lower approximation rules4)產生 Upper approximation rules5)產生 Combinatorial rules6)解釋 cluster 的特徵7)重複(返回 Step3 )

14

Page 15: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

step1.產生 Result equivalence classes, Xk

針對每個 cluster 產生 result equivalence class

Member ID Education(E) Job(J) GIDMember1 N N BMember2 N H AMember3 N H BMember4 L L CMember5 H H AMember6 H H AMember7 L N C

K GID Xk

1 A { Member2, Member5, Member6}2 B { Member1, Member3}3 C { Member4, Member7}

15

Page 16: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

step2.產生 Cause equivalence classes, Aij

針對屬性產生 Cause equivalence class

Aij

i=E j=N AEN={ Member1, Member2, Member3}i=E j=H AEH={ Member5, Member6}i=E j=L AEL={ Member4, Member7}i=J j=N AJN={ Member1, Member7}i=J j=H AJH={ Member2, Member3, Member5, Member6}i=J j=L AJL={ Member4}

Member ID Education(E) Job(J) GIDMember1 N N BMember2 N H AMember3 N H BMember4 L L CMember5 H H AMember6 H H AMember7 L N C

16

Page 17: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

step3.產生 Lower approximation rules , Confidence = 1 X1 = { Member2, Member5, Member6}

= { Member2} = { Member5, Member6} = {Φ} = {Φ} = { Member2 、 Member5, Member6} = {Φ}

Rule1: If Education = H then GID = A Confidence = 1

kXijij AA *

1

*XENA

1

*XEHA

1

*XELA

1

*XEHA

1

*XJHA

1

*XJLA

Aij

AEN={ Member1, Member2, Member3}AEH={ Member5, Member6}AEL={ Member4, Member7}AJN={ Member1, Member7}AJH={ Member2, Member3, Member5, Member6}AJL={ Member4}

17

Page 18: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

step4.產生 Upper approximation rules , , Confidence = X1 = { Member2, Member5, Member6}

= { Member2} = { Member5, Member6} = {Φ} = {Φ} = { Member2 、 Member5, Member6} = {Φ}

1

*XENA

1

*XEHA

1

*XELA

1

*XEHA

1

*XJHA

1

*XJLA

Aij

AEN={ Member1, Member2, Member3}AEH={ Member5, Member6}AEL={ Member4, Member7}AJN={ Member1, Member7}AJH={ Member2, Member3, Member5, Member6}AJL={ Member4}

kXijij AA *

ij

Xij

A

A k

*

18

Page 19: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

step4.產生 Upper approximation rules Confidence Threshold = 0.75 Rule2: If Education = N then GID = A

Confidence=

Reject Rule2 (0.33≦0.75)

Rule3: If Job = H then GID = A Confidence=

Accept Rule3(0.75 ≦0.75)

3

1

},,{

}{321

2*

1

MemberMemberMember

Member

A

A

EN

XEN

4

3

},,,{

},,{6532

652*

1

MemberMemberMemberMember

MemberMemberMember

A

A

JN

XJN

19

Page 20: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

step5.產生 Combinatorial rules Confidence =

結合 Rule ,產生考量多個屬性的關聯規則

kjijiij XAAA )( ''''''

),)(

,)(

,)(

min(''''

''''''

''

''''''''''''

ji

kjijiij

ji

kjijiij

ij

kjijiij

A

XAAA

A

XAAA

A

XAAA

20

X1 = { Member2, Member5, Member6}

Page 21: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

step5.產生 Combinatorial rules Rule4: If Education = N and Job = H then GID

= A Confidence=

Rule5: If Education = H and Job = H then GID = A Confidence =

4

1)

4

1,

3

1min(

)},,,{

}{,

},,{

}{min(

),min(

6532

2

321

2

1,1,

MemberMemberMemberMember

Member

MemberMemberMember

Member

A

XA

A

XA

JH

JHEN

EN

JHEN

2

1)

4

2,

2

2min(

)},,,{

},{,

},{

},{min(

),min(

6532

65

65

65

1,1,

MemberMemberMemberMember

MemberMember

MemberMember

MemberMember

A

XA

A

XA

JH

JHEH

EH

JHEH

21

Page 22: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

step6.解釋 cluster的特徵 將規則匯總並解釋其特徵 屬於 Cluster 1(Cluster A) 的 Member :

100% 的人 Education = High 75% 的人 Job = High 25% 的人 Education = Normal 且 Job = High 50% 的人 Education = High 且 Job = High

Education = H(Confidence=1)

Cluster A

Education = N & Job = H(Confidence=0.25)

Job = H(Confidence=0.75)

Education = H & Job = H(Confidence=0.5)

22

Page 23: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

step7.重複 返回 Step3 ,計算下一個 equivalence class Xk ,

以此方式重複進行直到所有的 equivalence class 皆計算完成。

23

Page 24: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Association of different clusters 利用 Rough Set Theory 分析不同 cluster 之間的關

係 Ex: A 類的會員較喜歡 b 類的商品; C 類的會員較喜歡 d

類的商品…

Order ID Buyer ID Receiver ID Product IDOID1 Member3 Member5 Product10

OID2 Member3 Member2 Product25

OID3 Member1 Member6 Product13

OID4 Member2 Member4 Product24

OID5 Member2 Member1 Product15

OID6 Member2 Member2 Product26

Member ID attribute1 attribute2 … GID(Cluster)Member1 … 2Member2 … 2Member3 … 1Member4 … 3Member5 … 3Member6 … 2

Order ID Buyer(GID) Receiver(GID) Product(GID)OID1 1 3 7OID2 1 2 3OID3 2 2 6OID4 2 3 2OID5 2 2 7OID6 2 2 6

24

Page 25: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Association of different clusters

Aij

Buyer 1 { OID1, OID2}Buyer 2 { OID3, OID4, OID5, OID6}

Order ID Buyer(GID) Receiver(GID) Product(GID)OID1 1 3 7OID2 1 2 3OID3 2 2 6OID4 2 3 2OID5 2 2 7OID6 2 2 6

Aij

Receiver 2 { OID2, OID3, OID5, OID6}Receiver 3 { OID1, OID4}

Aij

Product 2 { OID4}Product 3 { OID2 }Product 6 { OID3, OID6}Product 7 { OID1, OID5 }

Aij

Buyer 1 { OID1, OID2}Buyer 2 { OID3, OID4, OID5, OID6}

Aij

Product 2 { OID4}Product 3 { OID2 }Product 6 { OID3, OID6}Product 7 { OID1, OID5 }

R1: If Product = 3 Then Receiver = 2, Confidence = 1R2: If Product = 6 Then Receiver = 2, Confidence = 1

R3: If Buyer = 1 Then Receiver = 2, Confidence = 0.5R4: If Buyer = 2 Then Receiver = 2, Confidence = 0.75

R5: If Product = 7 Then Receiver = 2, Confidence = 0.5

25

Page 26: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

系統實做 以某家商店的交易紀錄為對象

Product Table 有 1120 筆記錄 Customer Table 有 35 筆紀錄

保留 2000 筆交易紀錄作為探勘的資料 經由維度、屬性的挑選

Customer Clustering education 、 job 、 gender

Product Clustering sales price 、 import price 、 sale price of VIP

customers

26

Page 27: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

SOM network Interface

27

Page 28: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Relationship analysis Interface

28

Page 29: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Use Rules for Recommendation 某一位顧客想購買一個商品贈送其朋友,但他不知該

買什麼較適合。

顧客的 cluster = 7 ,而其朋友的 cluster = 1 ,則系統可推薦 cluster = 9 之商品給顧客

29

Page 30: S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

Conclusion 本篇採用 SOM 與 rough set theory 進行群集與規

則粹取 Rule extraction module 描述了不同群集間之關係特性分析者可進一步選擇其他屬性,以分析出群集間的關係,例如,星座、心理測驗或血型等。

本研究利用 Rough Set Theory 找出資料中的關聯規則,而關聯規則又可分為兩個方向: cluster 的特徵敘述和不同 cluster 間關係;然而在實作中,只有呈現不同 cluster 間之關係,並沒有提到 cluster 的特徵敘述和該如何應用。