27
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations Shuhei Denzumi 1 , Ryo Yoshinaka 2, 1 , Shin-ichi Minato 1,2 , and Hiroki Arimura 1 1) Hokkaido University 2) JST ERATO Minato Discrete Structure Manipulation System Project

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations Shuhei Denzumi 1, Ryo Yoshinaka

Embed Size (px)

Citation preview

Notes on Sequence Binary Decision Diagrams:

Relationship to Acyclic Automata and Complexities of Binary Set Operations

Shuhei Denzumi1, Ryo Yoshinaka2, 1, Shin-ichi Minato1,2, and Hiroki Arimura1

1) Hokkaido University2) JST ERATO Minato Discrete Structure Manipulation System Project

Background

Researches on string processing become active.Massive online data: The internet and sensing networks.

String matching and string mining problems.

Data miningInput data should be represented in compact form

Computation under compressed structure is needed

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Data Structure

Data Structure

InputInput

ResultResultOperationOperationCompressCompressInputInput

InputInput

Manipulatable & Compact

Manipulatable Compact data structureRepresent data in compressed form

Have operations to manipulate data in compacted style

Get much attention for recent years

Binary Decision Diagram (BDD)LSI area

Deterministic Finite Automata (DFA)Natural Language Processing area

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Data Structure

Data Structure

InputInput

InputInput

InputInput

CompactionCompaction

D 1D 1

D 2D 2

D 3D 3OperationOperation

Sequence Binary Decision Diagram (SeqBDD, SDD).Loekito, Bailey, and Pei (2009)

Graph structure

Represent finite sets of stringswith finite length

SDD’s basic properties are unknownMinimization

Size complexity

Operation time

ApplicationData mining

Graph mining

Human genome sequencing

What is Sequence BDD?

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Sequence Binary

Decision Diagram

Sequence Binary

Decision Diagram

TextTextTextText

TextText

Family of BDDs

Compact representation for discrete structureWith rich algebraic operations

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

SDD [Loekito, et.al 2009]Sets of strings

SDD [Loekito, et.al 2009]Sets of strings

{a, b, ab, bab, abbab}{abc, acb, bac, bca}

ZDD [Minato 1993]Sets of combinations

ZDD [Minato 1993]Sets of combinations

{{a}, {b}, {a, b}}{{a}, {b}, {c}, {a, b, c}}

BDD [Bryant 1986]Boolean functions

BDD [Bryant 1986]Boolean functions

xy ∨ yz ∨ zx

¬ xyz ∨ x¬ yz ∨ xy¬ z

Relationship to Acyclic Deterministic Finite Automata (ADFA)Translation from an SDD to an ADFA and vice versa

An SDD is never larger than an ADFA

An SDD can be |Σ| times smaller than an ADFA

Computational complexity of binary set operationsGeneralize eight set operations

Tight analysis on time complexity for binary set operation algorithm

Experimental resultsSDDs can be smaller than ADFAs

Binary operation time

Result

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Preliminary

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Definition

Σ: alphabet (totally ordered by )≺

Internal node: , , , , 1/0 - terminal node: /

1/0 - edge: /

SDD: directed acyclic graph

Internal node S, τ(S) ↦ 〈 S.lab, S.1, S.0 〉 S.lab: label

S.1: 1-child

S.0: 0-child

Ordering ruleN.lab (N.0).lab≺

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

aa bb 11 00… zz

S.0

S.labS.labS

S.1

aa bb zz…≺ ≺ ≺ 11 00

aa

bb cc

L(N): set of strings N represents

L( ) = {ε}

L( ) = {}

L(N) = N.lab ・ L(N.1) L(N.0)∪

A path from the root to the 1-terminal noderepresent a string.

Semantics

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

11 00

aa bb

bb

{ε} {}

{b}

{a, b} {bb}

aa

{aa, ab, bb}

11 00

aa bb

bb

{ε} {}

{b}

{a, b} {bb}

aa

{aa, ab, bb}

11 00

aa bb

bb

{ε} {}

{b}

{a, b} {bb}

aa

{aa, ab, bb}

11 00

aa bb

bb

{ε} {}

{b}

{a, b} {bb}

aa

{aa, ab, bb}

0011

accept state

reject state

Comparison to ADFA

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

0011

bb ccaa

a b c

aa bb cc

aa bb cc

11 00

aa bb

bb

{a, b} {bb}aa

{aa, ab, bb}

a b

a b

b

{a, b} {b}

{aa, ab, bb}

Reduction process

Suppression

N.1 ≠ 0-terminal node

In ADFA, removing edges pointing dead state

Merging

τ(N) = τ(N’) N = N’⇒In ADFA, share all equivalent nodes

Theorem

Under these rules, SDD is unique and minimal

Like ADFA’s have unique canonical form

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

N.0

xx

N

N.1

xx

N’

N.0

xx

N

N.1

00

aa

N.0 N.0

a ・ {} ∪ L(N.0) = L (N.0)

Almost isomorphic to Acyclic Deterministic Finite Automata

BDD/ZDD techniques are applicable

Binary formSimple recursive algorithm

Easy to implement

Rich collections of operations

Use of hash tablesTo share equivalent nodes

To share intermediate computations

Characteristic

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

BDD/ZDDBDD/ZDD ADFAADFA

SDDSDD

Relationship toAcyclic Automata

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Size

An SDD node correspond to an ADFA edge

The description size is proportional to|N|: the number of internal nodes in SDD N|A|: the number of edges in ADFA A

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

bb ccaa

a b c

Theorem: Size compare

For equivalent an SDD and an ADFA

From an ADFA A to an SDD N

From an SDD N to an ADFA A

SDD |Σ| times can be smaller than ADFA

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

2

)12(

NA

AN

0-child sharing

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

cc dd

aa

a c dee

bbe

e

cd b

Example

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

11

aa

aa

aa

bb cc

aa a

a

a

a

b

bb

b

b

c

cccc

|S| = 6 |A| = 14

ADFA ASDD S

{anbicj, n = 0, …, 4, i, j = 0, 1}

c

Experiment

Input: Canterbury corpusBibleAll: bible.txt, BibleBi: all bigrams from bible.txt, Ecoli: E.coli.txt

Fac means store all fanctors of input data

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

0 1,000,000 2,000,000 0.6

0.7

0.8

0.9

1.0 Size ratio

BibleAll BibleBi BibleAll (Fac) BibleBi (Fac) Ecoli (Fac)

Input size (byte)

SD

D s

ize /

DFA

siz

e

Binary Set Operation Algorithm

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Set operation

A binary set operation { , ∩, ♢ ∈ ∪ \ , …}

Input: two SDDs P, Q

Output: SDD Rsuch thatL(R) = L(P) L(Q)♢

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

PP QQ

P ♢ QP ♢ Q

Binary Set OperationBinary Set Operation

Apply algorithm

Originally for BDD [Bryant 1986], applied to SDD

Based on the definition L(N) = N.lab ・ L(N.1) L(N.0)∪In operation, (when P.lab = Q.lab)L(P) L(Q) = P.lab ♢ ・ (L(P.1) L(Q.1♢ )) (L(P.0) L(Q.0))∪ ♢

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

P1P1 P0

P0

aa

Q1Q1 Q0

Q0

aa

P1♢Q1P1♢Q1 P0♢Q1

P0♢Q1

aa

P Q P♢Q

Hash table technique

Key-Value hash tables

UniquetableKey: 〈 letter x, SDD node N1, SDD node N0 〉Value: SDD node N with τ(N) = 〈 x, N1, N0 〉

OpcacheKey: 〈 operation id , SDD node P, SDD node Q♢ 〉Value: SDD node R which is R = P Q♢

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

N0

xx

N

N1

Key (triple)〈 x, N1, N0〉

Value (node) N

Key (triple)〈♢ , P, Q〉

Value (node) R

Uniquetable Opcache

PP QQ

P ♢ QP ♢ Q

  Node create process

Any SDD node needed during computation is created via this process

Once an internal node is registered in Uniquetable, equivalent nodes will not created anymore.

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Check the Uniquetable for key 〈 x, N1, N0〉 .Check the Uniquetable for key 〈 x, N1, N0〉 .

ExistExist Not existNot exist

Return it.Return it. Create a new node and return it.

Create a new node and return it.

Time complexity

When P Q is executed♢Every operation use Opcache

At most |P| × |Q| different instances of recursive calls invoke

(Assume that the access time to hash tables is constant)

Naïve methodPrepare |P| × |Q| size table

This methodNo useless or redundant node

TheoremWorst case O(|P| |Q|) time

Example needs Ω(|P| |Q|) time exist

Lower and upper bound got

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Check the Opcache for key 〈♢ , P, Q〉 .

Check the Opcache for key 〈♢ , P, Q〉 .

ExistExist Not existNot exist

P ♢ Q is already done,

return it.

P ♢ Q is already done,

return it.

Continue to computation

on 0-side and 1-side.

Continue to computation

on 0-side and 1-side.

Experiment

Operation timePrepare two SDDs for all factors of random texts of length n

Time to compute operation

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 1000000

200

400

600

800

1000

1200

1400

1600

union

intersection

difference

Length of text(letter)

Exe

cuti

on

ti

me

(ms)

Conclusion

Relationship to Acyclic AutomataAn SDD can be |Σ| times smaller than an ADFA

For real data, SDDs are 10~20 % more compact than ADFAs

Computational complexity of binary set operationsWorst case time complexity is quadratic

Tight time bound is analyzed

In our experiment, operation time is almost linear

Future workEfficient implement of various operations

Propose substring index on SDD

Factor SDD construction algorithm

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Thank you!