55
作作 : 作作作 作作 : 作作作作作作作作作 Email: [email protected] URL : http://ccc.kmit.edu.tw 作作 : 111/06/08 程程程程程程程 Grammar

程式語言的語法 Grammar

Embed Size (px)

DESCRIPTION

程式語言的語法 Grammar. Grammar. Language. Recursive Definition. Mathematical Expression. Structure of Expressions. Formal Language. Backus Naur Form (BNF). 1960 by J. Backus and P. Naur. EBNF (Extended BNF). BNF  EBNF. BNF. EBNF. Formalism (Formal notation). N. Chomsky 近代結構語言學之父. - PowerPoint PPT Presentation

Citation preview

Page 1: 程式語言的語法 Grammar

作者 : 陳鍾誠單位 : 金門技術學院資管系Email: [email protected] : http://ccc.kmit.edu.tw

日期 : 112/04/20

程式語言的語法Grammar

Page 2: 程式語言的語法 Grammar

Grammar

2 陳鍾誠 - 112/04/20

Page 3: 程式語言的語法 Grammar

Language

3 陳鍾誠 - 112/04/20

Page 4: 程式語言的語法 Grammar

Recursive Definition

4 陳鍾誠 - 112/04/20

Page 5: 程式語言的語法 Grammar

Mathematical Expression

5 陳鍾誠 - 112/04/20

Page 6: 程式語言的語法 Grammar

Structure of Expressions

6 陳鍾誠 - 112/04/20

Page 7: 程式語言的語法 Grammar

Formal Language

7 陳鍾誠 - 112/04/20

Page 8: 程式語言的語法 Grammar

Backus Naur Form (BNF)

8 陳鍾誠 - 112/04/201960 by J. Backus and P. Naur

Page 9: 程式語言的語法 Grammar

EBNF (Extended BNF)

9 陳鍾誠 - 112/04/20

Page 10: 程式語言的語法 Grammar

BNF EBNF

10 陳鍾誠 - 112/04/20

BNF EBNF

Page 11: 程式語言的語法 Grammar

Formalism (Formal notation)

N. Chomsky

近代結構語言學之父

11 陳鍾誠 - 112/04/20N. Chromsky -

Page 12: 程式語言的語法 Grammar

Differing structural trees for the same expression

12 陳鍾誠 - 112/04/20

Page 13: 程式語言的語法 Grammar

Problem of Different structural trees

13 陳鍾誠 - 112/04/20

Page 14: 程式語言的語法 Grammar

No Ambiguous Sentence

14 陳鍾誠 - 112/04/20

Page 15: 程式語言的語法 Grammar

Context Free Language Syntactic equations of the form defined in EBNF generate context-

free languages. The term "context free” is due to Chomsky and stems from the fact

that substitution of the symbol left of = by a sequence derived from the expression to the right of = is always permitted, regardless of the context in which the symbol is embedded within the sentence.

It has turned out that this restriction to context freedom (in the sense of Chomsky) is quite acceptable for programming languages, and that it is even desirable.

Context dependence in another sense, however, is indispensible. We will return to this topic in Chapter 8.

15 陳鍾誠 - 112/04/20

Page 16: 程式語言的語法 Grammar

Regular Expression

A language is regular, if its syntax can be expressed by a single EBNF expression.

The requirement that a single equation suffices also implies that only terminal symbols occur in the expression.

Such an expression is called a regular expression.

16 陳鍾誠 - 112/04/20

Page 17: 程式語言的語法 Grammar

Syntax Analysis v.s. Regular Expression

The reason for our interest in regular languages lies in the fact that programs for the recognition of regular sentences are particularly simple and efficient. By "recognition" we mean the determination of the structure of the sentence, and thereby naturally the determination of whether the sentence is well formed, that is, it belongs to the language. Sentence recognition is called syntax analysis.

17 陳鍾誠 - 112/04/20

Page 18: 程式語言的語法 Grammar

Regular Expression v.s. State Machine

For the recognition of regular sentences a finite automaton, also called a state machine, is necessary and sufficient. In each step the state machine reads the next symbol and changes state. The resulting state is solely determined by the previous state and the symbol read. If the resulting state is unique, the state machine is deterministic, otherwise nondeterministic. If the state machine is formulated as a program, the state is represented by the current point of program execution.

18 陳鍾誠 - 112/04/20

Page 19: 程式語言的語法 Grammar

EBNF Program The analyzing program can be derived directly from the

defining syntax in EBNF. For each EBNF construct K there exists a translation rule which yields a program fragment Pr(K). The translation rules from EBNF to program text are shown below. Therein sym denotes a global variable always representing the symbol last read from the source text by a call to procedure next. Procedure error terminates program execution, signaling that the symbol sequence read so far does not belong to the language.

19 陳鍾誠 - 112/04/20

Page 20: 程式語言的語法 Grammar

Analyzing program

20 陳鍾誠 - 112/04/20

Page 21: 程式語言的語法 Grammar

EBNF with only 1 rule

21 陳鍾誠 - 112/04/20

Page 22: 程式語言的語法 Grammar

First()

22 陳鍾誠 - 112/04/20

Page 23: 程式語言的語法 Grammar

Precondition

23 陳鍾誠 - 112/04/20

Page 24: 程式語言的語法 Grammar

Lexical Analysis for Identifier

24 陳鍾誠 - 112/04/20

Page 25: 程式語言的語法 Grammar

Lexical Analysis for Integer

25 陳鍾誠 - 112/04/20

Page 26: 程式語言的語法 Grammar

Scanner

The process of syntax analysis is based on a procedure to obtain the next symbol. This procedure in turn is based on the definition of symbols in terms of sequences of one or more characters. This latter procedure is called a scanner, and syntax analysis on this second, lower level, lexical analysis.

26 陳鍾誠 - 112/04/20

Page 27: 程式語言的語法 Grammar

Lexical Analysis v.s. Syntax Analysis

27 陳鍾誠 - 112/04/20

Page 28: 程式語言的語法 Grammar

A Scanner Example

As an example we show a scanner for a parser of EBNF. Its terminal symbols and their definition in terms of characters are

28 陳鍾誠 - 112/04/20

Page 29: 程式語言的語法 Grammar

Procedure GetSym() –(1)

29 陳鍾誠 - 112/04/20

Page 30: 程式語言的語法 Grammar

Procedure GetSym() –(2)

30 陳鍾誠 - 112/04/20

Page 31: 程式語言的語法 Grammar

Procedure GetSym() –(3)

31 陳鍾誠 - 112/04/20

Page 32: 程式語言的語法 Grammar

Syntax Analysis Overview Goal – determine if the input token stream

satisfies the syntax of the program What do we need to do this?

An expressive way to describe the syntax A mechanism that determines if the input token

stream satisfies the syntax description For lexical analysis

Regular expressions describe tokens Finite automata = mechanisms to generate tokens

from input stream

Page 33: 程式語言的語法 Grammar

Just Use Regular Expressions?

REs can expressively describe tokens Easy to implement via DFAs

So just use them to describe the syntax of a programming language NO! – They don’t have enough power to express any non-

trivial syntax Example – Nested constructs (blocks, expressions,

statements) – Detect balanced braces:{{} {} {{} { }}}{ { { { {

}}}} }

. . .- We need unbounded counting!- FSAs cannot count except in a strictly modulo fashion

Page 34: 程式語言的語法 Grammar

Context-Free Grammars Consist of 4 components:

Terminal symbols = token or Non-terminal symbols = syntactic variables Start symbol S = special non-terminal Productions of the form LHSRHS

LHS = single non-terminal RHS = string of terminals and non-terminals Specify how non-terminals may be expanded

Language generated by a grammar is the set of strings of terminals derived from the start symbol by repeatedly applying the productions L(G) = language generated by grammar G

S a S aS TT b T b

T

Page 35: 程式語言的語法 Grammar

CFG - Example Grammar for balanced-parentheses

language S ( S ) S S

1 non-terminal: S 2 terminals: “)”, “)” Start symbol: S 2 productions

If grammar accepts a string, there is a derivation of that string using the productions “(())” S = (S) = ((S) S) = (() ) = (())

? Why is the final S required?

Page 36: 程式語言的語法 Grammar

More on CFGs

Shorthand notation – vertical bar for multiple productions S a S a | T T b T b |

CFGs powerful enough to expression the syntax in most programming languages

Derivation = successive application of productions starting from S

Acceptance? = Determine if there is a derivation for an input token stream

Page 37: 程式語言的語法 Grammar

A Parser

Parser

Context freegrammar, G

Token stream, s(from lexer)

Yes, if s in L(G)No, otherwise

Error messages

Syntax analyzers (parsers) = CFG acceptors which alsooutput the corresponding derivation when the token streamis accepted

Various kinds: LL(k), LR(k), SLR, LALR

Page 38: 程式語言的語法 Grammar

RE is a Subset of CFGCan inductively build a grammar for each RE

S a S aR1 R2S S1 S2R1 | R2 S S1 | S2R1* S S1 S |

WhereG1 = grammar for R1, with start symbol S1G2 = grammar for R2, with start symbol S2

Page 39: 程式語言的語法 Grammar

Grammar for Sum Expression

Grammar S E + S | E E number | (S)

Expanded S E + S S E E number E (S)

4 productions2 non-terminals (S,E)4 terminals: “(“, “)”, “+”, numberstart symbol: S

Page 40: 程式語言的語法 Grammar

Constructing a Derivation Start from S (the start symbol) Use productions to derive a sequence of

tokens For arbitrary strings α, β, γ and for a

production: A β A single step of the derivation is α A γ α β γ (substitute β for A)

Example S E + S (S + E) + E (E + S + E) + E

Page 41: 程式語言的語法 Grammar

Class Problem S E + S | E E number | (S)

Derive: (1 + 2 + (3 + 4)) + 5

Page 42: 程式語言的語法 Grammar

Parse TreeS

E + S

( S ) E

E + S 5

E + S1

2 E

( S )

E + S

E3 4

• Parse tree = tree representation of the derivation• Leaves of the tree are terminals• Internal nodes are non-terminals• No information about the order of the derivation steps

Page 43: 程式語言的語法 Grammar

Parse Tree vs Abstract Syntax Tree

S

E + S

( S ) E

E + S 5

E + S1

2 E

( S )

E + S

E3 4

+

+

+

+

1

2

3 4

5

Parse tree also called “concrete syntax”

AST discards (abstracts) unneededinformation – more compact format

Page 44: 程式語言的語法 Grammar

Derivation Order Can choose to apply productions in any order, select

non-terminal and substitute RHS of production Two standard orders: left and right-most Leftmost derivation

In the string, find the leftmost non-terminal and apply a production to it

E + S 1 + S Rightmost derivation

Same, but find rightmost non-terminal E + S E + E + S

Page 45: 程式語言的語法 Grammar

Leftmost/Rightmost Derivation Examples» S E + S | E

» E number | (S)

» Leftmost derive: (1 + 2 + (3 + 4)) + 5

S E + S (S)+S (E+S) + S (1+S)+S (1+E+S)+S (1+2+S)+S (1+2+E)+S (1+2+(S))+S (1+2+(E+S))+S (1+2+(3+S))+S (1+2+(3+E))+S (1+2+(3+4))+S (1+2+(3+4))+E (1+2+(3+4))+5

»Now, rightmost derive the same input string

Result: Same parse tree: same productions chosen, but in diff order

S E+S E+E E+5 (S)+5 (E+S)+5 (E+E+S)+5 (E+E+E)+5 (E+E+(S))+5 (E+E+(E+S))+5 (E+E+(E+E))+5 (E+E+(E+4))+5 (E+E+(3+4))+5 (E+2+(3+4))+5 (1+2+(3+4))+5

Page 46: 程式語言的語法 Grammar

Class Problem S E + S | E E number | (S) | -S

Do the rightmost derivation of : 1 + (2 + -(3 + 4)) + 5

Page 47: 程式語言的語法 Grammar

Ambiguous Grammars

In the sum expression grammar, leftmost and rightmost derivations produced identical parse trees

+ operator associates to the right in parse tree regardless of derivation order

(1+2+(3+4))+5+

+

+

+

1

2

3 4

5

Page 48: 程式語言的語法 Grammar

An Ambiguous Grammar

+ associates to the right because of the right-recursive production: S E + S

Consider another grammar S S + S | S * S | number

Ambiguous grammar = different derivations produce different parse trees More specifically, G is ambiguous if there are 2

distinct leftmost (rightmost) derivations for some sentence

Page 49: 程式語言的語法 Grammar

Ambiguous Grammar - Example

S S + S | S * S | number

Consider the expression: 1 + 2 * 3

Derivation 1: S S+S 1+S 1+S*S 1+2*S 1+2*3

Derivation 2: S S*S S+S*S 1+S*S 1+2*S 1+2*3

+

*1

2 3

*

+

1 2

3

Obviously not equal!

Page 50: 程式語言的語法 Grammar

Impact of Ambiguity

Different parse trees correspond to different evaluations!

Thus, program meaning is not defined!!

+

*1

2 3

*

+

1 2

3

= 7 = 9

Page 51: 程式語言的語法 Grammar

Can We Get Rid of Ambiguity?

Ambiguity is a function of the grammar, not the language!

A context-free language L is inherently ambiguous if all grammars for L are ambiguous

Every deterministic CFL has an unambiguous grammar So, no deterministic CFL is inherently ambiguous No inherently ambiguous programming languages

have been invented To construct a useful parser, must devise an

unambiguous grammar

Page 52: 程式語言的語法 Grammar

Eliminating Ambiguity

Often can eliminate ambiguity by adding nonterminals and allowing recursion only on right or left S S + T | T T T * num | num

T non-terminal enforces precedence Left-recursion; left associativity

S

S + T

T T * 3

1 2

Page 53: 程式語言的語法 Grammar

A Closer Look at Eliminating Ambiguity

Precedence enforced by Introduce distinct non-terminals for each

precedence level Operators for a given precedence level are

specified as RHS for the production Higher precedence operators are accessed by

referencing the next-higher precedence non-terminal

Page 54: 程式語言的語法 Grammar

Associativity

An operator is either left, right or non associative Left: a + b + c = (a + b) + c Right: a ^ b ^ c = a ^ (b ^ c) Non: a < b < c is illegal (thus undefined)

Position of the recursion relative to the operator dictates the associativity Left (right) recursion left (right) associativity Non: Don’t be recursive, simply reference next

higher precedence non-terminal on both sides of operator

Page 55: 程式語言的語法 Grammar

Class Problem (Tough)

S S + S | S – S | S * S | S / S | (S) | -S | S ^ S | number

Enforce the standard arithmetic precedence rules and remove all ambiguity from the above grammar

Precedence (high to low)(), unary –^*, /+, -Associativity^ = rightrest are left