17
1 An Efficient Algorithm for In cremental Mining of Associati on Rules Chin-Chen Chang, Yu-Chiang Li, Jung-San Lee RIDE-SDMA’05 Speaker 董董董 Advisor 董董董

An Efficient Algorithm for Incremental Mining of Association Rules

  • Upload
    danton

  • View
    62

  • Download
    0

Embed Size (px)

DESCRIPTION

An Efficient Algorithm for Incremental Mining of Association Rules. Chin-Chen Chang, Yu-Chiang Li, Jung-San Lee RIDE-SDMA ’ 05 Speaker :董原賓 Advisor : 柯佳伶. Introduction. Previous incremental mining algorithms FUP (Fast Update Algorithm) - PowerPoint PPT Presentation

Citation preview

Page 1: An Efficient Algorithm for Incremental Mining of Association Rules

1

An Efficient Algorithm for Incremental Mining of Association Rules

Chin-Chen Chang, Yu-Chiang Li, Jung-San Lee

RIDE-SDMA’05

Speaker :董原賓 Advisor :柯佳伶

Page 2: An Efficient Algorithm for Incremental Mining of Association Rules

2

Introduction Previous incremental mining algorithms

FUP (Fast Update Algorithm) FUP2 negative border※They all have to rescan the originally database

Problem Publication-like database

EX : Publication database, web log records, etc. The original database is normally much larger than the incremental database

Solution NFUP (New Fast Update Algorithm)

Page 3: An Efficient Algorithm for Incremental Mining of Association Rules

3

Definition

DB : original database db : the set of newly added transaction

s DB+ : DB + db n, Pn : db is divided into n partitions, db = P1UP2U,…,UPn-1UPn

dbm,n = PmUPm+1U,…,UPn-1UPn

Page 4: An Efficient Algorithm for Incremental Mining of Association Rules

4

Definition α set: frequent itemsets in DB+

β set: frequent in dbm,n , (m ≤ n), but infrequent in dbm-1,n

γ set: frequent in dbm,m, but infrequent in dbm+1,n

X.count : occurrence count

X.start : partition number when X becomes frequent

X.type : denotes one of the three types α,β, and γ

Page 5: An Efficient Algorithm for Incremental Mining of Association Rules

5

FUP (Fast Update Algorithm)

In case2, itemset is easily calculated In case3, FUP needs to rescan the orig

inal database

Page 6: An Efficient Algorithm for Incremental Mining of Association Rules

6

NFUP (New Fast Update Algo.) A backward method that only requires scan

ning incremental database

A frequent itemset in the incremental database is also important even if it is infrequent in the updated database

Partition the incremental database (db) by the time interval

Page 7: An Efficient Algorithm for Incremental Mining of Association Rules

7

NFUP The frequent set of itemsets of DB is k

nown in advance

NFUP scans each partition backward, the last partition is scanned first

In each partition, the process is performed like that of Apriori.

Page 8: An Efficient Algorithm for Incremental Mining of Association Rules

8

NFUP

Page 9: An Efficient Algorithm for Incremental Mining of Association Rules

9

Scan from Pn to P1 and find the α,β,γ itemsets in db

After P1 is scanned, the occurrence count is accumulated with itemsets of DB

Page 10: An Efficient Algorithm for Incremental Mining of Association Rules

10

The latest partition is scanned first, initialize variables and accumulate the occurrence

Still frequent in Pm then

accumulate count

Still frequent in dbm,n then accumulate count

Only frequent in dbm+1,n then Remove from α set and addInto β set

Not belong to any set and frequent in Pm then check if Pm is the latest partitionYes α set No γ set

Page 11: An Efficient Algorithm for Incremental Mining of Association Rules

11

Example

Scan p2 : 1-itemset

α set startcountβ set startcount γ set startcount

Min sup = 50%

{A: 2} {B: 2} {C: 3}{D: 1} {E: 1} {F: 2}

3 x 0.5 = 1.5

Check if itemset belongs to α setElse check itemset doesn’t belongs to any setCheck if itemset’s count >= 1.5Check if P2 is the latest partition yes α no γ

{A} 2 2

{B} 2 2

{C} 2 3

{F} 2 2

{AB} 2 2

{AC} 2 2

{BC} 2 2

{CF} 2 2

{ABC} 2 2

Run Apriori-gen scan P2 : 2-itemset {AB: 2} {AC: 2} {AF: 1} {BC: 2} {BF: 1} {CF: 2}

Check if itemset belongs to α set Else check itemset doesn’t belong to any set Check if itemset’s count >= 1.5 Check if P2 is the latest partition yes α no γ

{ABC: 2}Scan P2 : 3-itemset

Page 12: An Efficient Algorithm for Incremental Mining of Association Rules

12

Example

Scan p1 : 1-itemset

α set startcountβ set startcount γ set startcount

Min sup = 50%

{A: 1} {B: 3} {C: 2}{D: 1} {E: 3} {F: 0}

3 x 0.5 = 1.5

Check if itemset belongs to α set Check itemset doesn’t belongs to any setElse check if itemset’s count >= 1.5Check if P1 is the latest partition yes α no γ

{A} 2 2

{B} 2 2

{C} 2 3

{F} 2 2

{AB} 2 2

{AC} 2 2

{BC} 2 2

{CF} 2 2

{ABC} 2 2

Run Apriori-genscan P1 : 2-itemset {AB: 1} {AC: 0} {BC: 2}{BE: 3} {CE: 2}Check if itemset belon

gs to α set Check itemset doesn’t belong to any set Else check if itemset’s count >= 1.5 Check if P1 is the latest partition yes α no γ

Yesaccumulate countCount < s*|dbm,n| = 0.5x6 = 3 β set

Yesaccumulate countCount < s*|dbm,n| = 0.5x6 = 3 β set

3

5

51

1

1{F} 2 2 {E} 1 3

3

4

1

1

{AC} 2 2

{CF} 2 2

{BE} 1 3

{CE} 1 2

{ABC} 2 2

Page 13: An Efficient Algorithm for Incremental Mining of Association Rules

13

Example

α set startcount

{A} 1 3

{B} 1 5

{C} 1 5

{AB} 1 3

{BC} 1 4

γ set startcount

{E} 1 3

{BE} 1 3

{CE} 1 2

β set startcount

{F} 2 2

{AC} 2 2

{CF} 2 2

{ABC} 2 2

7

8

90

0

0

{AB} 1 3

{BC} 1 4

{ABC} 2 2

{AE} 0 3

Page 14: An Efficient Algorithm for Incremental Mining of Association Rules

14

Experiment

Intel Pentium IV 1.5GHz CPU, 640 MB main memory

Microsoft Windows 2000 Professional Synthetic datasets:

Page 15: An Efficient Algorithm for Incremental Mining of Association Rules

15

Experiment

Page 16: An Efficient Algorithm for Incremental Mining of Association Rules

16

Experiment

Page 17: An Efficient Algorithm for Incremental Mining of Association Rules

17

Experiment