26
Temporal Database Temporal Database Paper Reading Paper Reading R95922007 R95922007 資資資 資資資 資資資 資資資 Efficient Mining Strategy for Frequent Serial E Efficient Mining Strategy for Frequent Serial E pisodes in Temporal Database pisodes in Temporal Database , , K Huang, C K Huang, C Chang Chang

Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Embed Size (px)

Citation preview

Page 1: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Temporal DatabaseTemporal DatabasePaper ReadingPaper Reading

R95922007 R95922007 資工碩一 馬智釗資工碩一 馬智釗

Efficient Mining Strategy for Frequent Serial EpisodEfficient Mining Strategy for Frequent Serial Episodes in Temporal Databasees in Temporal Database, , K Huang, C ChangK Huang, C Chang

Page 2: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

IntroductionIntroduction

Discover Discover frequent serial episodesfrequent serial episodes to find to find relationships between events.relationships between events.- explain the problems that cause a particular - explain the problems that cause a particular eventevent

- predict future result- predict future result

EpisodeEpisode : a partially ordered collection : a partially ordered collection of events occurring together.of events occurring together.- the user defines “how close is close enough”- the user defines “how close is close enough”

- - winwin : the width of the time window : the width of the time window

Page 3: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Three classes of Three classes of episodes episodes Introduced by Mannila et al.Introduced by Mannila et al. Serial episodesSerial episodes

- patterns of a total order in the sequence- patterns of a total order in the sequence Parallel episodesParallel episodes

- no constraints on the relative order- no constraints on the relative order Composite episodesComposite episodes

- serial combination of parallel episodes- serial combination of parallel episodes

Page 4: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Examples : episodesExamples : episodes

Page 5: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Algorithms (old)Algorithms (old)

Presented by Mannila et al.Presented by Mannila et al. Finding parallel and serial episodes tFinding parallel and serial episodes t

hat are frequent enough.hat are frequent enough. WINEPIWINEPI

- consider the - consider the supportsupport of an episode of an episode MINEPIMINEPI

- consider the number of - consider the number of minimal occurrencesminimal occurrences of an episodeof an episode

Page 6: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

WINEPIWINEPI

Consider the Sequence S=AConsider the Sequence S=A33AA44BB55BB66.. supportsupport : the number of sliding windo : the number of sliding windo

ws with width = ws with width = winwin.. Given Given winwin=3, there are six windows :=3, there are six windows :

WW11=A=A33, W, W22=A=A33AA44, W, W33=A=A33AA44BB55,,WW44=A=A44BB55BB66, W, W55=B=B55BB66, W, W66=B=B6 6 ..

<A,B> is supported by two windows.<A,B> is supported by two windows.

Page 7: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

MINEPIMINEPI

Consider the Sequence S=AConsider the Sequence S=A33AA44BB55BB66.. minimal occurrencesminimal occurrences : an interval that : an interval that

contains episode contains episode αα, but no proper su, but no proper sub-interval does.b-interval does.

<A> has <A> has momo support 2. support 2.- interval [3,3] and [4,4].- interval [3,3] and [4,4].

<A,B> has <A,B> has momo support 1. support 1.- interval [4,5].- interval [4,5].

Page 8: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Complex sequencesComplex sequences

Several events occurring at one Several events occurring at one timetime

Example :Example :

A temporal database is a complex A temporal database is a complex sequence with temporal attributes.sequence with temporal attributes.

AADD

BB AABBEE

CCEE

AABBFF

AACCEE

BBDDFF

DD

Page 9: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Algorithms (new)Algorithms (new)

Extend the algorithm to deal with coExtend the algorithm to deal with complex sequences.mplex sequences.

MINEPI+MINEPI+- depth-first enumeration to generate the frequent - depth-first enumeration to generate the frequent episodes by episodes by equalJoinequalJoin and and temporalJointemporalJoin..

EMMAEMMA- - EEpisodes pisodes MMining using ining using MMemory emory AAnchornchor- utilizes memory anchors to accelerate mining tas- utilizes memory anchors to accelerate mining taskk

Page 10: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

More about MINEPIMore about MINEPI

Breath-first mannerBreath-first manner- enumerate longer episodes from shorter ones- enumerate longer episodes from shorter ones

ParametersParameters- - maxwinmaxwin : maximum window width for an episode : maximum window width for an episode- - minsupminsup : minimal frequent for “frequent episod : minimal frequent for “frequent episode”e”

Temporal JoinTemporal Join- connects events from different time intervals- connects events from different time intervals

Page 11: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Example : MINEPIExample : MINEPI

S = AS = A11AA22BB33AA44BB55, , maxwinmaxwin=4, =4, minsupminsup=2=2 Find frequent 1-episode firstFind frequent 1-episode first

- - momo(A)={[1,1],[2,2],[4,4]}, (A)={[1,1],[2,2],[4,4]}, momo(B)={[3,3],[5,5]}(B)={[3,3],[5,5]} Temporal Join with Temporal Join with maxwinmaxwin=4=4

- possibles of <A,B> : [1,3],[2,3],[2,5],[4,5]- possibles of <A,B> : [1,3],[2,3],[2,5],[4,5]- mo(<A,B>)={[2,3],[4,5]} (choose minimal ones)- mo(<A,B>)={[2,3],[4,5]} (choose minimal ones)- support(<A,B>)={[- support(<A,B>)={[11,4],[,4],[22,5],[,5],[44,5]},5]}- support count = 3, counting distinct start point- support count = 3, counting distinct start point

Page 12: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

MINEPI+MINEPI+

Must deal with complex sequences.Must deal with complex sequences. Depth-first manner for memory savingDepth-first manner for memory saving Equal JoinEqual Join

- connects events at the same interval- connects events at the same interval Bound ListBound List

• For a serial episode P=<pFor a serial episode P=<p11,…,p,…,pkk>>- {[ts- {[tsii,te,teii] : S contains P in time [ts] : S contains P in time [tsii,te,teii]}]}

• For an event YFor an event Y- {[t- {[tii,t,tii] : S contains P in time t] : S contains P in time tii}}

Page 13: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Example : bound listExample : bound list

maxwinmaxwin = 4. = 4. Bound list of <A,B,C> : {[1,4],[3,6]}.Bound list of <A,B,C> : {[1,4],[3,6]}. Bound list of <C> : {[4,4],[6,6]}.Bound list of <C> : {[4,4],[6,6]}.

11 22 33 44 55 66 77 88

AADD

BB AABBEE

CCEE

AABBFF

AACCEE

BBDDFF

DD

Page 14: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

OperationsOperations

Given P=<pGiven P=<p11,…,p,…,pkk> and an event f.> and an event f.- P.boundlist = {[ts- P.boundlist = {[ts11,te,te11],…,[ts],…,[tsnn,te,tenn]}]}- f.boundlist = {[ts’- f.boundlist = {[ts’11,ts’,ts’11],…,[ts’],…,[ts’mm,ts’,ts’mm]}]}

Equal Join : PEqual Join : P11=P=P⊙⊙f=<pf=<p11,…,p,…,pkk∪∪f>.f>.- P- P11.boundlist are [ts.boundlist are [tsii,te,teii] such that] such that teteii=ts’=ts’j j for some j (1for some j (1≦≦jj≦≦m)m)

Temporal Join : PTemporal Join : P22=P=P .. f=<pf=<p11,…,p,…,pkk,f>.,f>.- P- P22.boundlist are [ts.boundlist are [tsii,ts’,ts’jj] such that] such that ts’ts’jj-ts-tsii<<maxwinmaxwin and ts’ and ts’jj>te>teii for some j (1 for some j (1≦≦jj≦≦m)m)

Page 15: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Drawbacks of MINEPI+Drawbacks of MINEPI+

Huge amount of combinationsHuge amount of combinations- Consider |I| 1-frequent episodes- Consider |I| 1-frequent episodes- O(|I|- O(|I|22) checking for temporal joins and equal joins) checking for temporal joins and equal joins

Unnecessary joinsUnnecessary joins- should skip temporal joins for a prefix if the numb- should skip temporal joins for a prefix if the numberer

of extendable matching bounds < of extendable matching bounds < minsup minsup × |TDB|× |TDB| Duplicate joinsDuplicate joins

- episode <ABC,ABC> need 4+1 joins :- episode <ABC,ABC> need 4+1 joins : <A>→<AB>→<ABC>→<ABC,A>→<ABC,AB>→<ABC,ABC><A>→<AB>→<ABC>→<ABC,A>→<ABC,AB>→<ABC,ABC>

Page 16: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

EMMAEMMA

Divide into three phasesDivide into three phases(I) Mining frequent itemset in the complex sequence.(I) Mining frequent itemset in the complex sequence.(II) Encode each frequent itemset with a unique ID,(II) Encode each frequent itemset with a unique ID,

and construct a encoded horizontal database.and construct a encoded horizontal database.(III) Mining episodes in the encoded database.(III) Mining episodes in the encoded database.

Depth-First SearchDepth-First Search Memory AnchorMemory Anchor

- utilize the boundlists to access information- utilize the boundlists to access information- timelists of frequent itemsets are their boundlists- timelists of frequent itemsets are their boundlists

Page 17: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Example : databaseExample : database

minsupminsup = 5 = 5

Page 18: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Combine episodesCombine episodes

Only combine existing episodes with Only combine existing episodes with a “local” frequent 1-tuple episode.a “local” frequent 1-tuple episode.- overcome the huge amount of generations- overcome the huge amount of generations

Projected boundlist (PBL)Projected boundlist (PBL)- episode #3=<C> has boundlist- episode #3=<C> has boundlist {[1,1],[2,2],[4,4],[8,8],[11,11],[14,14],[15,15]}{[1,1],[2,2],[4,4],[8,8],[11,11],[14,14],[15,15]}- given - given maxwinmaxwin = 4, the projected boundlist is = 4, the projected boundlist is {[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]}{[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]}- note that |TDB|=16- note that |TDB|=16

Page 19: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Example : PBL Example : PBL

#3.timelist={1,2,4,8,11,14,15}.#3.timelist={1,2,4,8,11,14,15}.1 → [2,4]1 → [2,4]2 → [3,5]2 → [3,5]4 → [5,7]4 → [5,7]8 → [9,11]8 → [9,11]11 → [12,14]11 → [12,14]14 → [15,16]14 → [15,16]15 → [16,16]15 → [16,16]

with with maxwinmaxwin = 4 and |TDB|=16. = 4 and |TDB|=16.

Page 20: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Local frequent IDLocal frequent ID

A local frequent ID has boundlist that caA local frequent ID has boundlist that can match into other episode’s PBL.n match into other episode’s PBL.- - #3.PBL={[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]}#3.PBL={[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]}- #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}- #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}

Record boundlist of ID when examining.Record boundlist of ID when examining.- get the boundlist immediately at temporal join- get the boundlist immediately at temporal join- <C,D>=<#3,#4> then <C,D>.boundlist =- <C,D>=<#3,#4> then <C,D>.boundlist = {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}{[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}

Page 21: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Example : temporal Example : temporal joinjoin #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}.#4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}. Recall the construction of #3.PBLRecall the construction of #3.PBL

11 → [2,4] : → [2,4] : [3,3][3,3] in it in it22 → [3,5] : → [3,5] : [3,3][3,3] in it (take minimal) in it (take minimal)44 → [5,7] : → [5,7] : [5,5][5,5] in it in it88 → [9,11] : → [9,11] : [9,9][9,9] in it in it1111 → [12,14] : → [12,14] : [12,12][12,12] in it in it1414 → [15,16] : → [15,16] : [16,16][16,16] in it in it1515 → [16,16] : → [16,16] : [16,16][16,16] in it in it

Result : {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}Result : {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}

Page 22: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Procedure : emmajoinProcedure : emmajoin

Recursively extend the episodesRecursively extend the episodes- until no more serial episodes can be extended- until no more serial episodes can be extended

Avoid unnecessary checking in MINEPI+Avoid unnecessary checking in MINEPI+- stop when the number of extendable bounds for a- stop when the number of extendable bounds for a serial episode is less than serial episode is less than minsup minsup × |TDB|.× |TDB|.

Example : #2=<B>.Example : #2=<B>.- #2.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}- #2.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}- #2.PBL={[4,6],[7,9],[10,12],[13,15]} (|TDB|=16)- #2.PBL={[4,6],[7,9],[10,12],[13,15]} (|TDB|=16)- do not need to extend #2 if - do not need to extend #2 if minsupminsup = 5 = 5

Page 23: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Example : emmajoinExample : emmajoin

#3.BL={[1,1],[4,4],[8,8],[11,11],[14,14],[15,15]}.#3.BL={[1,1],[4,4],[8,8],[11,11],[14,14],[15,15]}. #7.BL={[1,1],[4,4],[8,8],[11,11],[14,14]}.#7.BL={[1,1],[4,4],[8,8],[11,11],[14,14]}. #9.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}.#9.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}. Call Call emmajoinemmajoin to extend each 1-tuple episodes to extend each 1-tuple episodes #3.PBL={[2,4],[5,7],[9,11],[12,14],[15,16],[16,16]}.#3.PBL={[2,4],[5,7],[9,11],[12,14],[15,16],[16,16]}. Find local frequent IDs in #3.PBL.Find local frequent IDs in #3.PBL.

Page 24: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Example : emmajoin (cont.)Example : emmajoin (cont.)

minsupminsup = 5, = 5, maxwinmaxwin = 4. = 4. By temporal Join :By temporal Join :

- <#3,#3>.BL={- <#3,#3>.BL={[1,4],[8,11],[11,14],[14,15]}}- <#3,#7>.BL={- <#3,#7>.BL={[1,4],[8,11],[11,14]}}- <#3,#9>.BL={[1,3],[4,6],[8,9],[11,12],[14,16]}- <#3,#9>.BL={[1,3],[4,6],[8,9],[11,12],[14,16]}- <#3,#9> is generated from prefix #3- <#3,#9> is generated from prefix #3- recursively call - recursively call emmajoinemmajoin to extend<#3,#9> to extend<#3,#9>- <#3,#9>.PBL={[4,4],[7,7],[10,11],[13,14]}- <#3,#9>.PBL={[4,4],[7,7],[10,11],[13,14]}- there are no local frequent IDs since - there are no local frequent IDs since minsupminsup=5=5

Back to call Back to call emmajoinemmajoin for episode #7. for episode #7.

Page 25: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

ExperimentsExperiments

On a dataset composed of 10 stocks.On a dataset composed of 10 stocks. Parameters : Parameters : maxwinmaxwin//minsup.minsup.

- more running time when - more running time when maxwin maxwin increasesincreases- more running time when - more running time when minsup minsup decreasesdecreases- since the number of frequent episodes increases- since the number of frequent episodes increases

EMMA runs faster than MINEPI+.EMMA runs faster than MINEPI+. MINEPI+ uses lesser space than EMMA.MINEPI+ uses lesser space than EMMA.

- EMMA needs large memory as - EMMA needs large memory as minsup minsup decreasesdecreases

Page 26: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

ConclusionConclusion

Modify MINEPI to MINEPI+Modify MINEPI to MINEPI+- for mining episodes in a complex sequence- for mining episodes in a complex sequence

Propose EMMAPropose EMMA- avoid the drawbacks of MINEPI+- avoid the drawbacks of MINEPI+

EMMA is more efficient than MINEPI+.EMMA is more efficient than MINEPI+. Future workFuture work

- only discussed serial episodes- only discussed serial episodes- parallel and composite episodes remain to be solved- parallel and composite episodes remain to be solved