Preview:
Citation preview
- 1. : : 2006.11.25
- 2. (Data Mining) Association Rule Mining Sequential Pattern
Mining Classification Clustering
- 3. MiningMining MiningMining (Corporate Memory) (Corporate
Intelligence) (Data Mining)
- 4. Data Mining Trend Pattern Relationship (Knowledge Discovery
in Database, KDD)
- 5. Data Mining Association Rule Mining Sequential Pattern
Mining Classification Clustering Association Rule Mining Sequential
Pattern Mining Classification data data data training
classification rule classification rule data Classification
Supervised( ) Clustering Clustering Unsupervised( )
- 6. Association Rule Mining Definition : data set data item
Glossary : Support(AB) = P(A B) Confidence(AB) = P(B | A) Example:
Buys(X, computer) Buys(X,financial_management_software ) A =
Buys(X, computer) B = Buys(X,financial_management_software)
Support(AB) Confidence
- 7. Association Rule Mining ( ) Algorithm: Association Rule
Apriori minimum support minimum confidence Association Rule a.
frequent item set b. frequent item set Association Rule Association
Rule minimum support minimum confidence Apriori frequent item set
item frequent 1- itemset item frequent itemset frequent k-itemset
a. frequent (k-1)-itemset Join k item set Candidate itemset b.
Candidate itemset minimum support minimum support frequent
k-itemset
- 8. Association Rule Mining ( ) Example: 9 item 5 5 item minimum
support 2
- 9. Association Rule Mining ( )
- 10. Association Rule Mining ( ) {1,4} {3,4} {3,5} {4,5} support
minimum support
- 11. Association Rule Mining ( ) set support minimum support set
frequent itemsets Association Rule minimum confidence 70%
1&5=>2 2&5=>1 5=>1&2 minimum support minimum
confidence Association Rule minimum support minimum confidence
- 12. Sequential Pattern Mining Q. How to find the sequential
patterns?
- 13. Item Itemset Transaction Step 1: Customer_Id
TransactionTime Sequential Pattern Mining ( )
- 14. With minimum support of 2 customers: The large itemset
(litemset): (30), (40), (70), (90), (40 70) Item Itemset
Transaction Step 2: Large Itemset Sequential Pattern Mining (
)
- 15. Sequence is supported by customer 1 and 4 is supported by
customer 2 and 4 3-Sequence Step 3: Sequences Sequential Pattern
Mining ( )
- 16. Q. Find the large sequences with minimum support set to
25%: - Large sequence: , , , , , , Step 4: Large Sequences
Sequential Pattern Mining ( )
- 17. Q. Find the maximal sequences with minimum support of 2
customers: - The answer set is: , Sequential Patterns Step 5:
Maximal Sequences Sequential Pattern Mining ( )
- 18. The Algorithm has five phases: Sort phase Large itemset
phase Transformation phase Sequence phase Maximal phase ApriorAll
ApriorSome DynamicSome Sequential Pattern Mining ( )
- 19. Sort the database with customer-id as the major key and
transaction-time as the minor key. Sort phase
- 20. Find the large itemset. association rules mining large
itemset itemset customer transactions itemset support Itemsets
mapping Litemset phase
- 21. Transformation phase Deleting non-large itemsets Mapping
large itemsets to integers
- 22. Sequence phase Use the set of litemsets to find the desired
sequence. Two families of algorithms: Count-all: AprioriAll
Count-some: AprioriSome, DynamicSome
- 23. Maximal phase Find the maximum sequences among the set of
large sequences. large sequences sequences sub-sequences maximum
sequences In some algorithms, this phase is combined with the
sequence phase.
- 24. Maximal phase Algorithm: S the set of all litemsets n the
length of the longest sequence for (k = n; k > 1; k--) do for
each k-sequence sk do Delete from S all subsequences of sk
- 25. AprioriAll The basic method to mine sequential patterns
Based on the Apriori algorithm. Count all the large sequences,
including non-maximal sequences. Use Apriori-generate function to
generate candidate sequence.
- 26. Apriori Candidate Generation Generate candidates for pass
using only the large sequences found in the previous pass. Then
make a pass over the data to find the support of the
candidates.
- 27. Algorithm: Lk the set of all large k-sequences Ck the set
of candidate k-sequences Apriori Candidate Generation insert into
Ck select p.litemset1, p.litemset2,, p.litemsetk-1,q.litemsetk-1
from Lk-1 p, Lk-1 q where p.litemset1=q.litemset1,,
p.litemsetk-2=q.litemsetk-2; for all sequences cCk do for all
(k-1)-subsequences s of c do if (sLk-1) then delete c from Ck;
- 28. AprioriAll (cont.) L1 = {large 1-sequences}; // Result of
the phase for ( k=2; Lk-1; k++) do begin Ck = New candidate
generate from Lk-1 foreach customer-sequence c in the database do
Increment the count of all candidates in Ck that are contained in c
Lk = Candidates in Ck with minimum support. End Answer=Maximal
Sequences in UkLk;
- 29. Example: (Customer Sequences) Apriori Candidate Generation
next step: find the large 1-sequences With minimum set to 25%
- 30. next step: find the large 2-sequences Sequence Support
Example Large 1-Sequence 4 2 4 4 2
- 31. next step: find the large 3-sequences Sequence Support 2 4
3 3 2 2 3 2 2 Example Large 2-Sequence
- 32. next step: find the large 4-sequences Sequence Support 2 2
3 2 2 Example Large 3-Sequence
- 33. next step: find the sequential pattern Sequence Support 2
Example Large 4-Sequence
- 34. Sequence Support 2 Example Sequence Support 4 2 4 4 2
Sequence Support 2 4 3 3 2 2 3 2 2 Sequence Support 2 2 3 2 2 Find
the maximal large sequences
- 35. Classification Definition : training classication rule
Classification Rule Classification Rule data Decision Tree
Classification ! Decision Tree Classification Rule Decision Tree
NP-Hard induction-based Hunt Hunt
- 36. Classification ( ) Training Case T Decision Tree (C1,C2,Ck)
Cases
- 37. Classification ( ) Example: Training Data Set Outlook Windy
Humidity
- 38. Classification ( ) Classification Rule Rule Outlook rain
Humidity 95 Windy false Classification Rule Classification
Rule
- 39. Clustering Definition : Clustering unsupervised learning
Clustering Methods : Partitioning methods k k a. data b. data
Hierarchical methods hierarchical agglomerative divisive
agglomerative bottom up divisive top down Density-based methods
Cluster Density-based Clustering Data Grid-based methods Grid-based
Clustering multiresolution grid data structure cell grid structure
multiresolution grid structure resolution multiresolution grid data
structure cell Model-based methods
- 40. Clustering ( ) Clustering Market Segmentation Fraud
Detection fraud clusters fraud Defect Analysis clusters Lapse
Analysis clusters
- 41. Clustering ( ) clustering clusters ? Clusters Clusters
clusters cluster records cluster record cluster ? (similarity)
distance (weighting) clustering ?
- 42. Data Mining Association Rule Item Clustering