1“Approximating Context-Free Grammar Ambiguity”November 2, 2004
Approximating Context-Free Grammar Ambiguity
Claus [email protected]
BRICS, Department of Computer Science
University of Aarhus, Denmark
2“Approximating Context-Free Grammar Ambiguity”November 2, 2004
“Approximating Context-Free Grammar Ambiguity”
// Abstract
Context-free grammar ambiguity is undecidable.
However, just because it’s undecidable, doesn’t mean there aren’t (good) approximations! Indeed, the whole area of static analysis works on “side-stepping undecidability”.
We exhibit a characterization of context-free ambiguity which induces a whole framework for approximating the problem.
In particular, we give an approximation, AMN, based on the [Mohri-Nederhof, 2000] regular approximation of context-free grammars and show how to boost the precision even further.
3“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// OutlineIntroductionVertical / Horizontal AmbiguityCharacterization of Ambiguity(Over-)Approximation FrameworkApproximation (AMN)
AssessmentRelated WorkConclusion
4“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Context-Free Grammar
N finite set of nonterminals finite set of terminalss N start nonterminal : N P(E*) production function, E = N
G = N, , s,
Assume:All nN reachable (from s)All nN derive some (finite) string
L : G P(*) language of G, L(G)
5“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Relevant CFG Decision Problems
Decidable:Membership: L(GCFG)
Emptyness: L(GCFG) =
Intersection (w/ REG): L(GCFG) L(RREG) = L(CCFG)
… constructively
Undecidable:Intersection (w/ CFG): L(GCFG) L(G’CFG) ?…Ambiguity: *: 2 derivation trees ?
6“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Ambiguity: Undecidable!
Algorithms:Undecidable!
However…
Ts
T’s
=
unambiguous ambiguous
Ambiguity: *: 2 derivation trees ?
?
7“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// “Side-Stepping Undecidability”
Unsafe approximation:
Safe approximation:
However, just because it’s undecidable, doesn’t mean there aren’t (good) approximations! Indeed, the whole area of static analysis works on “side-stepping undecidability”.
unambiguous ambiguous
safe (over-)approximation
unambiguous ambiguous
safe (under-)approximation
unambiguous ambiguous
unsafe approximation
8“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// MotivationUse safe (over-)approximation:
“Yes!” “G guaranteed unambiguous”!!!Safely use any GLR parser on G
Because: never two parses at runtime!
Hence:dynamic parse ambiguity static parse ambiguity
unambiguous ambiguous
Yes!
.
9“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Motivation (cont’d)Undecidability means: “there’ll always be a slack”:
However, still useful!Possible interpretations of “No?”:
Treat as error (reject grammar):“Please redesign your grammar” (as in [LA]LR(k))
Treat as warning:“Here are some potential problems”
unambiguous ambiguous
No?
. .
10“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Vertical Ambiguity“Vertical ambiguity”:
Example:
n N : , ’ (n) : ’ L() L(’) =
xay
Z : x A y : x B y A : aB : a
Ambiguous string:
~ “reduce/reduce conflict” in [Yacc]
G
11“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Horizontal Ambiguity“Horizontal ambiguity”:
where:
Example:
n N: (n): i [1..||-1]: L(0 .. i-1) L(i .. ||-1 ) =
: P(*) P(*) P(*)
X Y = { xay | x,y* a+ x,xaL(X) y,ayL(Y) }
xay
Z : A B A : x a : xB : a y : y
Ambiguous string:
~ “shift/reduce conflict” in [Yacc]
G
12“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Characterization of AmbiguityTheorem 1:
Lemma 1a: (“”)
Lemma 1b: (“”)
G G G unambiguous
G G G unambiguous
G G G unambiguous
13“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Proof (Lemma 1a): “”
…or contrapositively:
Proof:Assume G ambiguous (i.e. 2 der. trees for )
Show: by induction in max height of the 2 derivation trees
G G G unambiguous
G ambiguous G G
G G
14“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Proof (Lemma 1a): “” (Base)Base case (height 1):
The ambiguity means that (for pp’):
Which means:i.e., we have a vertical ambiguity:
N
’1N
1
L() L(’) {}
p p’
=
G
15“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Proof (Lemma 1a): “” (I.H.)Induction step (height n):
Assume induction hypothesis (for height n-1)
The ambiguity means:
N
n-1
N
n-1
i ’i’
… …i … …’i’p p’11
|-1|= ’0 ’|’-1|0.. .. .. ..
=
16“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Proof (Lemma 1a): “” (pp’)Case p = q (different production):
…but then i.e., we have a vertical ambiguity:
L() L(’) {}
p p’
G
N
n-1
N
n-1
i ’i’
… …i … …’i’p p’11
|-1|= ’0 ’|’-1|0.. .. .. ..
=
17“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Proof (Lemma 1a): “” (p=p’,1)Case p q (same prod. ):
i.e. “the top of the trees are the same”Case :
ambiguity in subtreei ( deriving same i):
Induction hypothesis (this subtree)
i : i = ’i
p = p’ i : i = ’i
N
n-1
N
n-1
i i
… …i … …i’p p’11
|-1|= 0 |-1|0.. .. .. ..
=
G G
18“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Proof (Lemma 1a): “” (p=p’,2)Case p q (same prod. ):
Case :…but then: (assume WLOG ):
Now pick any k:...then:
N
n-1
N
n-1
i
. … .i p
i : i ’i
p = p’
p11
i : i = ’i i : i = ’i
ji: j ’j
=
j
j ’i
. … .i j
’j
i k < j
L(0 .. k) L(k+1 .. || )
k k
least such i2nd least such j
G
19“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Proof (Lemma 1b): “”
Contrapositively:
Assume “ ” (vertical conflict):Then for some NN:
But then derive (using reachability + derivability of N):
s * x N x * x a * x a y
s * x N x ’ * x a * x a y
N * a, N ’ * a, L() L(’) {a}
G G G unambiguous
G ambiguous G G
20“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Proof (Lemma 1b): “” (cont’d)Assume “ ” (horizontal conflict):
Then for some NN:
But then derive (using reachability + derivability of N):
s * v N v * v x * v x a y * v x a y w
s * v N v * v x a * v x a y * v x a y w
N , L() L()
x,y * : a + : x,xa L() y,ay L()
i.e.
21“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// (Over-)Approximation (A)(Over-)Approximation A : E* P(*)
A decidable “ ” and “ ” decidable on co-dom(A)
Approximated vertical ambiguity:
Approximated horizontal ambiguity:
E* : L() A()
n N : , ’ (n) : A() A(’) =
A
A
n N: (n): i [1..||-1]: A(0 .. i-1) A(i .. ||-1) =
G
G
22“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Ambiguity ApproximationTheorem 2:
Proof:
“Conflicts w/ smaller sets conflicts w/ larger sets”:
G unambiguous
A() A() = L() L() =
A() A() = L() L() =
AA
AA
G G
G G G G
23“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Compositionality (of A’s)Colloary 3:
Proof:Follows from definition [omited…]
i.e. “Approximations are compositional”!:
A, A’ decidable (over-)approximations A A’ decidable (over-)approximation
unambiguous ambiguous
unambiguous ambiguous
unambiguous ambiguous
A
A’
A A’
24“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Choice(s) of A?A*() = * (constant)
Worst approximation…but safe approximation!
Useless: “Cannot determine that any grammars are unambiguous”
unambiguous ambiguous
worst approximation
25“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Choice(s) of A? (cont’d)AMN() = [Mohri-Nederhof]()
CFG DFA (NFA) Approximation
Properties of this “ Black-box ”:Good (over-)approximation!Works on language, L(G);
not on grammatical structure, G
Approximation parameterizable:E.g. unfold nonterminals “n” times
“Regular Approximation of Context-Free Grammars through Transformation”[Mohri-Nederhof, 2000]
Black-box
26“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Decidability (of AMN)
“” decidable (using DFAs)O(|XNFA||YNFA|)
“ ” decidable (using DFAs)O(|XNFA||YNFA|)
AMN decidable
With potential counterexamples (using DFAs)
X Y =
X Y =
G unambiguous
AMNAMN
27“Approximating Context-Free Grammar Ambiguity”November 2, 2004
For X,Y regular languages:
All overlappings, “xay”, as DFAs; variant of “” construction!
// Decision Algorithm for (X Y)
XNFA YNFA
[X;Y]NFA
a path :
XNFA YNFA
[X;Y]NFA
a a
x y
x a ya
a
X Y
YX
X Y
28“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Three Approximation AnswersY!:
“G definitely not ambiguous”!“?/D?”:
“?”: “Don’t know”?…could not find any potential counterexamples.
“D?”: “Don’t know” – look at over-approx, D?…and here are all potential counterexamples
Note: some strings do not even parse!
Improve: Parse S FIN D subset of real counterexamples
True answer
29“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Regaining Lost Precision!Now parse all counterexamples!
i.e. parse DFA, DDFA:
1) i.e. construct:Decidable in O(|D||G|)
2) Decide emptyness on C:Decidable in O(|C| = |D||G|)
Only potential counterexamples that parse!
L(CCFG) = L(DDFA) L(GCFG)
L(CCFG) =
30“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Three Approximation AnswersY!:
“G definitely not ambiguous”!“?/C?”:
“?”: “Don’t know”?…could not find any counterexamples.
“C?”: “Don’t know” – look at over-approx, C?…and here are all potential counterexamples
Note: all strings actually parse (maybe not ambiguously)!
Improve: extract finite under-approximation...?
True answer
31“Approximating Context-Free Grammar Ambiguity”November 2, 2004
[Mohri-Nederhof]: O(n2vh)Vertical Amb: O(n3v4h4)Horizontal Amb: O(n3v3h5)Total: O(n3v3h4(v+h)) O(g5)
// Asymptotic (Time) Complexity
N1 : e1,1 … ea,1
: … : e1,p … ea,p
h
n
vn = |N|v = max{|(N)|, NN}h = max{||, (N), NN}g = nvh = |G|
32“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Related Work (Dynamic)Dynamic disambiguation:
“Disambiguation-by-convention”:Longest match, most specific match, …
Customizable:[Bison v. 1.5+]: %dprec, %merge[ASF+SDF]: “disambiguation filters”
Dynamic ambiguity interception:GLR ([Tomita], [Early], [Bison], [ASF+SDF], …)
33“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Related Work (Static)Static disambiguation:
“Disambiguation-by-convention”:First match, most specific match, …
Customizable:[Yacc]: %left, %right, %nonassoc, %prec
Static ambiguity interception:LL(k), [LA-]LR(k), …Our work goes here (but for GLR)!
34“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Implementationdisamb (Java)
In progress…!
35“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// AssessmentQuality of approximation ~ ~ Quantity of false-positives
Precision:Our \ LR(k) ?LR(k) \ Our ?False-positives ?Characterize “?” / “N?”
In terms of grammatical structure ?
Efficiency (in practise…)
In progress…!
36“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Example: Expression chains
…!?
E -> E + T -> TT -> T * F -> FF -> ( E ) -> x
37“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Example: Balancing StructuresNasty:
Requires:Unbounded memory (# x’es)
i.e. CFG structure
Unbounded lookaheadi.e. any finite k is insufficient
False-positives!
S -> A AA -> x A x -> y xxyxxxyx
Example string:
38“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Future WorkPermit
With disambiguating conventions for:AssociativityPrecedence
Parsing optimization:Exploit compile-time analysis information at runtime
…
E -> E E
39“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Conclusion
But wait, there’s more…
“Approximating Context-Free Grammar Ambiguity”
Context-free grammar ambiguity is undecidable.
However, just because it’s undecidable, doesn’t mean there aren’t (good) approximations! Indeed, the whole area of static analysis works on “side-stepping undecidability”.
We exhibit a characterization of context-free ambiguity which induces a whole framework for (over-)approximation.
In particular, we give an approximation based on the [Mohri-Nederhof, 2000] regular approximation of context-free grammars and show how to boost the precision even further.
40“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Lessons LearnedFramework:
Plug in your favorite (over-)approximation of L()Even take intersection of them: A = i Ai
Approximation closed under intersection
Methodology:Just because it’s undecidable doesn’t mean there aren’t (good) approximations
Quantity of false-positives (practically motivated)What to do with false-positives (pratically motivated)
Don’t be scared of undecidability
41“Approximating Context-Free Grammar Ambiguity”November 2, 2004
[bonus slides]
42“Approximating Context-Free Grammar Ambiguity”November 2, 2004
// Membership: Decidable!Membership (aka. “parsing”):
Given * :“Is the string, , in the language of G”:
Algorithms:LL(k) O(||)[LA-]LR(k) O(||)GLR O(||3)…
L(G)
43“Approximating Context-Free Grammar Ambiguity”November 2, 2004
The ambiguity problem for [X;Y]...
In fact, already a problem if x’ “goes too far”:
Thus, we only have a problem if (“X eats into Y”):
Essentially disambiguation by picking longest match
// Parsing Greedily Left-to-Right
x y
x’ y’
x y
- (“too little”): Not possible (due to greediness)
... may occur in 2 cases:
- (“too much”): Only this is a problem!
X X;( prefix(Y) \ {} ) X Y
x’ y’