MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

Embed Size (px)

Citation preview

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    1/22

    CSC 3130: Automata theory and formal languages

    Andrej Bogdanov

    http://www.cse.cuhk.edu.hk/~andrejb/csc3130

    Normal forms and parsing

    Fall 2008

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    2/22

    Testing membership and parsing

    Given a grammar

    How can we know if a string xis in its language?

    If so, can we reconstruct a parse tree forx?

    S 0S1 | 1S0S1 | T

    T S | e

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    3/22

    First attempt

    Maybe we can try all possible derivations:

    S 0S1 | 1S0S1 | TT S |

    x= 00111

    S 0S1

    1S0S1

    T

    00S11

    01S0S11

    0T1

    S

    10S10S1...

    when do we stop?

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    4/22

    Problems

    How do we know when to stop?

    S 0S1 | 1S0S1 | TT S |

    x= 00111

    S 0S1

    1S0S1

    00S11

    01S0S11

    0T1

    10S10S1...

    when do we stop?

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    5/22

    Problems

    Idea: Stop derivation when length exceeds |x|

    Not right because of-productions

    We might want to eliminate -productions too

    S 0S1 | 1S0S1 | TT S |

    x= 01011

    S 0S1 01S0S11 01S011 010111 3 7 6 5

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    6/22

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    7/22

    Unit productions

    A unit production is a production of the form

    whereA1 andA2 are both variables

    Example

    A1 A2

    S 0S1 | 1S0S1 | TT S | R |

    R 0SR

    grammar: unit productions:

    S T

    R

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    8/22

    Removal of unit productions

    If there is a cycle of unit productions

    delete it and replace everything withA1

    Example

    A1 A2 ... Ak A1

    S 0S1 | 1S0S1 | T

    T S | R | R 0SR

    S T

    R

    S 0S1 | 1S0S1

    S R | R 0SR

    T is replaced by S in the {S, T} cycle

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    9/22

    Removal of unit productions

    For other unit productions, replace every chain

    by productionsA1 ,... , Ak

    Example

    A1 A2 ... Ak

    S R 0SRis replaced by S 0SR, R 0SR

    S 0S1 | 1S0S1

    | R | R 0SR

    S 0S1 | 1S0S1

    | 0SR| R 0SR

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    10/22

    Removal of-productions

    A variable N is nullable if there is a derivation

    How to remove -productions (except from S)Find all nullable variables N1, ..., Nk

    Fori= 1 to k

    For every production of the formA Ni

    ,

    add another productionA

    IfNi is a production, remove it

    If S is nullable, add the special productionS

    N*

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    11/22

    Example

    Find the nullable variables

    S

    ACDA a

    B

    C ED |

    D BC | b

    E b

    B C D

    nullable variablesgrammar

    Find all nullable variables N1, ..., Nk

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    12/22

    Finding nullable variables

    To find nullable variables, we work backwards First, mark all variablesA s.t.A as nullable

    Then, as long as there are productions of the form

    where all ofA1,, Ak are marked as nullable, markA

    as nullable

    A A1 A

    k

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    13/22

    Eliminating -productions

    S ACDA a

    B

    C ED |

    D

    BC | bE b

    nullable variables:B, C, D

    Fori= 1 to kFor every production of the formA Ni,

    add another productionA

    IfNi is a production, remove it

    D CS AD

    D B

    D

    S AC

    S A

    C E

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    14/22

    Recap

    After eliminating -productions and unitproductions, we know that every derivation

    doesnt shrink in lengthand doesnt go intocycles

    Exception: S We will not use this rule at all, except to check if L

    Note

    -productions must be eliminated before unit

    S a1ak where a1, , ak are terminals*

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    15/22

    Example: testing membership

    S 0S1 | 1S0S1 | TT S |

    x= 00111

    S | 01 | 101 | 0S1|10S1 | 1S01 | 1S0S1

    S 01, 101

    10S1

    1S01

    1S0S1

    10011, strings of length 6

    10101, strings of length 6

    unit, -prod

    eliminate

    only strings of length 6

    0S1 0011, 01011

    00S11

    strings of length 6

    only strings of length 6

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    16/22

    Algorithm 1 for testing membership

    We can now use the following algorithm to checkif a string xis in the language ofG

    Eliminate all -productions and unit productions

    Ifx = and S , accept; else delete S LetX:= S

    While some new production P can be applied to X

    Apply P to X

    IfX= x, accept

    If|X| > |x|, backtrack

    If no more productions can be applied toX, reject

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    17/22

    Practical limitations of Algorithm I

    Previous algorithm can be very slow ifxis long

    There is a faster algorithm, but it requires that we

    do some more transformations on the grammar

    G = CFG of the java programming language

    x= code for a 200-line java program

    algorithm might take about 10200 steps!

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    18/22

    Chomsky Normal Form

    A grammar is in Chomsky Normal Form if everyproduction (except possiblyS ) is of the type

    Conversion to Chomsky Normal Form is easy:

    A BC A aor

    A BcDEreplaceterminals

    with new

    variables

    A BCDE

    C c break upsequenceswith new

    variables

    A BX1X1 CX2X2 DE

    C c

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    19/22

    Exercise

    Convert this CFG into Chomsky Normal Form:

    S |ADDA

    A a

    C c

    D bCb

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    20/22

    Algorithm 2 for testing membership

    S AB | BC

    A BA | a

    B CC | b

    C AB | a

    x= baaba

    Idea: We generate each substring ofxbottom up

    ab b aa

    ACB B ACACBSA SASC

    B B

    SAC

    SAC

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    21/22

    Parse tree reconstruction

    S AB | BC

    A BA | a

    B CC | b

    C AB | a

    x= baabaab b aa

    ACB B ACACBSA SASC

    B B

    SAC

    SAC

    Tracing back the derivations, we obtain the parse tree

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 9

    22/22

    Cocke-Younger-Kasami algorithm

    Fori= 1 to k

    If there is a productionA xiPutA in table cell ii

    Forb= 2 to kFors= 1 to kb+ 1

    Set t= s+ b

    Forj= sto t

    If there is a productionA BC

    where B is in cell sjand C is in celljt

    PutA in cell st

    x1 x2 xk

    11 22 kk

    12 23

    1k

    s j t k1

    b

    Input: GrammarG in CNF, string x = x1xk

    Cell ijremembers all possible derivations of substring xixj