Upload
danton
View
62
Download
0
Embed Size (px)
DESCRIPTION
An Efficient Algorithm for Incremental Mining of Association Rules. Chin-Chen Chang, Yu-Chiang Li, Jung-San Lee RIDE-SDMA ’ 05 Speaker :董原賓 Advisor : 柯佳伶. Introduction. Previous incremental mining algorithms FUP (Fast Update Algorithm) - PowerPoint PPT Presentation
Citation preview
1
An Efficient Algorithm for Incremental Mining of Association Rules
Chin-Chen Chang, Yu-Chiang Li, Jung-San Lee
RIDE-SDMA’05
Speaker :董原賓 Advisor :柯佳伶
2
Introduction Previous incremental mining algorithms
FUP (Fast Update Algorithm) FUP2 negative border※They all have to rescan the originally database
Problem Publication-like database
EX : Publication database, web log records, etc. The original database is normally much larger than the incremental database
Solution NFUP (New Fast Update Algorithm)
3
Definition
DB : original database db : the set of newly added transaction
s DB+ : DB + db n, Pn : db is divided into n partitions, db = P1UP2U,…,UPn-1UPn
dbm,n = PmUPm+1U,…,UPn-1UPn
4
Definition α set: frequent itemsets in DB+
β set: frequent in dbm,n , (m ≤ n), but infrequent in dbm-1,n
γ set: frequent in dbm,m, but infrequent in dbm+1,n
X.count : occurrence count
X.start : partition number when X becomes frequent
X.type : denotes one of the three types α,β, and γ
5
FUP (Fast Update Algorithm)
In case2, itemset is easily calculated In case3, FUP needs to rescan the orig
inal database
6
NFUP (New Fast Update Algo.) A backward method that only requires scan
ning incremental database
A frequent itemset in the incremental database is also important even if it is infrequent in the updated database
Partition the incremental database (db) by the time interval
7
NFUP The frequent set of itemsets of DB is k
nown in advance
NFUP scans each partition backward, the last partition is scanned first
In each partition, the process is performed like that of Apriori.
8
NFUP
9
Scan from Pn to P1 and find the α,β,γ itemsets in db
After P1 is scanned, the occurrence count is accumulated with itemsets of DB
10
The latest partition is scanned first, initialize variables and accumulate the occurrence
Still frequent in Pm then
accumulate count
Still frequent in dbm,n then accumulate count
Only frequent in dbm+1,n then Remove from α set and addInto β set
Not belong to any set and frequent in Pm then check if Pm is the latest partitionYes α set No γ set
11
Example
Scan p2 : 1-itemset
α set startcountβ set startcount γ set startcount
Min sup = 50%
{A: 2} {B: 2} {C: 3}{D: 1} {E: 1} {F: 2}
3 x 0.5 = 1.5
Check if itemset belongs to α setElse check itemset doesn’t belongs to any setCheck if itemset’s count >= 1.5Check if P2 is the latest partition yes α no γ
{A} 2 2
{B} 2 2
{C} 2 3
{F} 2 2
{AB} 2 2
{AC} 2 2
{BC} 2 2
{CF} 2 2
{ABC} 2 2
Run Apriori-gen scan P2 : 2-itemset {AB: 2} {AC: 2} {AF: 1} {BC: 2} {BF: 1} {CF: 2}
Check if itemset belongs to α set Else check itemset doesn’t belong to any set Check if itemset’s count >= 1.5 Check if P2 is the latest partition yes α no γ
{ABC: 2}Scan P2 : 3-itemset
12
Example
Scan p1 : 1-itemset
α set startcountβ set startcount γ set startcount
Min sup = 50%
{A: 1} {B: 3} {C: 2}{D: 1} {E: 3} {F: 0}
3 x 0.5 = 1.5
Check if itemset belongs to α set Check itemset doesn’t belongs to any setElse check if itemset’s count >= 1.5Check if P1 is the latest partition yes α no γ
{A} 2 2
{B} 2 2
{C} 2 3
{F} 2 2
{AB} 2 2
{AC} 2 2
{BC} 2 2
{CF} 2 2
{ABC} 2 2
Run Apriori-genscan P1 : 2-itemset {AB: 1} {AC: 0} {BC: 2}{BE: 3} {CE: 2}Check if itemset belon
gs to α set Check itemset doesn’t belong to any set Else check if itemset’s count >= 1.5 Check if P1 is the latest partition yes α no γ
Yesaccumulate countCount < s*|dbm,n| = 0.5x6 = 3 β set
Yesaccumulate countCount < s*|dbm,n| = 0.5x6 = 3 β set
3
5
51
1
1{F} 2 2 {E} 1 3
3
4
1
1
{AC} 2 2
{CF} 2 2
{BE} 1 3
{CE} 1 2
{ABC} 2 2
13
Example
α set startcount
{A} 1 3
{B} 1 5
{C} 1 5
{AB} 1 3
{BC} 1 4
γ set startcount
{E} 1 3
{BE} 1 3
{CE} 1 2
β set startcount
{F} 2 2
{AC} 2 2
{CF} 2 2
{ABC} 2 2
7
8
90
0
0
{AB} 1 3
{BC} 1 4
{ABC} 2 2
{AE} 0 3
14
Experiment
Intel Pentium IV 1.5GHz CPU, 640 MB main memory
Microsoft Windows 2000 Professional Synthetic datasets:
15
Experiment
16
Experiment
17
Experiment