MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

Embed Size (px)

Citation preview

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    1/24

    CSC 3130: Automata theory and formal languages

    Andrej Bogdanov

    http://www.cse.cuhk.edu.hk/~andrejb/csc3130

    LR(k) grammars

    Fall 2008

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    2/24

    LR(0) example from last time

    A aAb

    Aab

    A aAb

    A ab

    A aAbA ab

    A aAb

    A aAb

    A ab

    a

    b

    bAa

    1

    2

    3

    4

    5

    A aAb | ab

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    3/24

    LR(0) parsing example revisited

    Stack Input

    S

    S

    SRS

    R

    1

    1a2

    1a2a2

    1a2a2b3

    1a2A4

    1a2A4b5

    1A

    aabb

    abb

    bb

    b

    b

    A S

    A aAb | ab A aAb aabb

    1

    2

    2

    3

    4

    5

    A

    A aAb

    Aab A aAb

    A ab

    A aAb

    A ab

    A aAb A aAb

    A ab

    a

    b

    b

    A

    a1

    2

    3

    4 5

    Aa b

    a b

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    4/24

    Meaning of LR(0) items

    a

    A

    A aXb

    NFA transitions to:

    Xg

    X b

    focus

    shift focus tosubtree rooted atX(ifXis nonterminal)

    A aXb

    move past subtreerooted atX

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    5/24

    Outline of LR(0) parsing algorithm

    Algorithm can perform two actions:

    What if:

    no completeitem

    is valid

    there is one valid item,and it is complete

    shift (S) reduce (R)

    some valid itemscomplete, some

    not

    more than one validcomplete item

    S / R conflict R / R conflict

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    6/24

    Definition of LR(0) grammar

    A grammar is LR(0) if S/R, R/R conflicts neveroccur

    LR means parsing happens left to right and produces arightmost derivation

    LR(0) grammars are unambiguous and have afast

    parsing algorithm

    Unfortunately, they are not expressive enough

    to describe programming languages

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    7/24

    context-free grammarsparse using CYK algorithm (slow)

    LR() grammars

    Hierarchy of context-free grammars

    LR(1) grammars

    LR(0) grammarsparse using LR(0) algorithm

    javaperl

    python

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    8/24

    A grammar that is not LR(0)

    S A(1)

    | Bc(2)

    A aA(3) | a(4)

    B a(5) | ab(6)

    input: a

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    9/24

    A grammar that is not LR(0)

    S A(1)

    | Bc(2)

    A aA(3) | a(4)

    B a(5) | ab(6)

    A

    S

    A B

    A

    aA

    a a

    A

    a a

    S S

    ca

    input:

    possibilities:

    shift (3), reduce (4)

    reduce (5), shift (6)

    valid LR(0) items:A aA, A a

    B a, B ab,

    A aA, A a

    a

    S/R, R/R conflicts!

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    10/24

    Lookahead

    S A(1)

    | Bc(2)

    A aA(3) | a(4)

    B a(5) | ab(6)

    A

    S

    A B

    A

    aA

    a a

    A

    a a

    S S

    ca

    input:

    apeek inside!

    valid LR(0) items:A aA, A a

    B a, B ab,

    A aA, A a

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    11/24

    Lookahead

    S A(1)

    | Bc(2)

    A aA(3) | a(4)

    B a(5) | ab(6)

    input: a apeek inside!

    valid LR(0) items:A aA, A a

    B a, B ab,

    A aA, A a

    A

    A

    a a

    S

    parse tree mustlook like this

    action: shift

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    12/24

    Lookahead

    S A(1)

    | Bc(2)

    A aA(3) | a(4)

    B a(5) | ab(6)

    input: a a apeek inside!

    valid LR(0) items:A aA, A a

    A aA, A a

    parse tree mustlook like this

    A

    A

    aA

    a

    S

    action: shift

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    13/24

    Lookahead

    S A(1)

    | Bc(2)

    A aA(3) | a(4)

    B a(5) | ab(6)

    input: a a a

    valid LR(0) items:A aA, A a

    A aA, A a

    parse tree mustlook like this

    action: reduce

    A

    A

    aA

    a a

    S

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    14/24

    LR(0) items vs. LR(1) items

    A

    A

    a b

    a b

    Aa b

    A aAb | ab

    A aAb

    A

    A

    a b

    a b

    Aa b

    [A aAb, b]

    LR(0) LR(1)

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    15/24

    LR(1) items

    LR(1) items are of the form

    to represent this state in the parsing

    [A ab, x] [A ab, ]or

    a b x

    A

    a b

    A

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    16/24

    Outline of LR(1) parsing algorithm

    Step 1: Build NFA that describes valid itemupdates

    Step 2: Convert NFA to DFA

    As in LR(0), DFA will have shift and reduce states

    Step 3: Run DFA on input, using stack to

    remembersequence of states

    Use lookahead to eliminate wrong reduce items

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    17/24

    Recall NFA transitions for LR(0)

    States of NFA will be items (plus a start state q0) For every item S a we have a transition

    For every itemA aXb we have a transition

    For every itemA aCb and production C d

    S aq0

    A aXbX

    A aXb

    C dA aCb

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    18/24

    NFA transitions for LR(1)

    For every item [S a, ]we have a transition

    For every itemA aXb we have a transition

    For every item [A aCb, x] and production C

    d

    for everyyin FIRST(bx)

    [S a, ]q0

    [A aXb, x]X

    [A aXb, x]

    [C d,y]

    [A aCb, x]

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    19/24

    FIRST sets

    Example

    FIRST(a) is the set of terminals that occuron the left in some derivation startingfrom a

    S A(1) | cB(2)

    A aA(3) | a(4)

    B

    a

    (5)

    | ab

    (6)

    FIRST(a) = {a}

    FIRST(A) = {a}

    FIRST(S) = {a, c}

    FIRST(bAc) = {b}FIRST(BA) = {a}

    FIRST() =

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    20/24

    Explaining the transitions

    [A aXb, x]X

    [A aXb, x]

    [C d,y]

    [A aCb, x]

    a

    A

    C b x

    a

    A

    X b x a

    A

    X b x

    yFIRST(bx)

    y

    C b

    d

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    21/24

    Example

    S

    A

    (1)

    | Bc

    (2)

    A aA(3) | a(4)

    B a(5) | ab(6)

    [S A, ]

    q0

    [S Bc, ]

    [S A, ]

    A[A aA, ]

    [B a,c]

    [S Bc, ]

    [B ab,c]

    . . .

    B

    [A a, ]

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    22/24

    Convert NFA to DFA

    Each DFA state is a subset of LR(1) items, e.g.

    States can contain S/R, R/R conflicts

    But lookahead can always resolve such conflicts

    [A aA, ] [A a, ]

    [B a, c] [B ab, c]

    [A aA, ] [A a, ]

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    23/24

    Example

    S A(1)

    | Bc(2)

    A aA(3) | a(4)

    B a(5) | ab(6)

    stack input

    a

    abB

    Bc

    S

    abc

    bc

    cc

    A valid items

    [S A, ] [S Bc, ] [A aA, ][A a, ] [B a, c] [B ab, c]

    S

    SRSR

    [A aA, ] [A a, ] [B a, c]

    [B ab, c] [A aA, ] [A a, ]

    [B ab, c][S Bc, ]

    [S Bc, ]

    look ahead!

  • 7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 14

    24/24

    LR(k) grammars

    A context-free grammar is LR(1) if all S/R, R/Rconflicts can be resolved with one lookahead

    More generally, LR(k) grammars can resolve allconflicts with k lookahead symbols

    Items have the form [A ab, x1...xk]

    LR(1) grammars describe the semantics of mostprogramming languages