Languages & Strings

Preview:

DESCRIPTION

Languages & Strings. String Operations Language Definitions. Strings. A string x (over alphabet A) is a finite sequence x = x 1 x 2 .. x n where x i  A. Length – the length of x is the number of characters, n, in the sequence. Empty String – λ denotes the empty string of length 0. - PowerPoint PPT Presentation

Citation preview

Languages & Strings

String Operations

Language Definitions

Strings

• A string x (over alphabet A) is a finite sequence x = x1x2 .. xn where xi A.

• Length – the length of x is the number of characters, n, in the sequence.

• Empty String – λ denotes the empty string of length 0.

• Recursive definition of the set of strings A* over alphabet A– Basis : The empty string λ A*

– Recursive Step : If x A* and a A, then xa A*

– Closure : A* contains no other strings

Languages and String Operations

• Languages– A language L over alphabet A is any subset of A*

– Concatenation : The concatenation of two strings x, y is xy, a string of length of x + length of y.

– The concatenation of two languages : The concatenation of two languages L and M is LM, where LM = { z | z = xy where x L, y M.

• Example: T = D* and O = {“+”,”-”} where D = {0,1,..,9}. Then TOT is the language {“1+1, 12+24, . . .}

Recursive Definition of Regular Sets

• Let A be an alphabet. The regular sets over A are:– Basis : , {λ} and {a} are each regular sets– Recursive Step : If X, Y are regular sets, so is

• X Y• XY• X*

– Closure : X is a regular set over A iff it can be obtained by a finite number of applications of the recursive step

Regular Set Examples

• Signed and unsigned integers• Unparethesized expressions with variable

operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d - digit

• English sentences with structure <noun phrase><verb phrase><noun phrase> with– Lexical categories : d – determiner, a –

adjective, n – noun, x – adverb, v - verb

Regular Set Examples

• Signed and unsigned integers– ({} {+} {-}){d}{d}*

• Expressions without parentheses– (({l}({l} {d})*)(({+} {*})(({l}({l} {d})*))*

• Sentences– ({d}{a}*{n})(({} {x}){v})({d}{a}*{n})

Regular Expressions• The set of strings which begin with an “a” and end

with a “b” is a regular set over {a,b} since it equals {a}({a} {b})*{b}.

• Regular expressions represent regular sets as follows: , λ and a represent , {λ} and {a}.– If u and v are regular expressions (representing

reguar sets) then (u v), (uv) and (u*) are regular expressions representing their union, concatenation and Kleene closure.

– Dropping superfluous parentheses, a(a,b)*b represents the regular set: all strings starting with a and ending with b.

Grammars

A context free grammar G is a 4-tuple : G = ( V,,P,S ) where

1.     V is a set of nonterminals (or string variables), each representing a sublanguage from which the variable takes its values. Examples are <noun phrase> which can take on values such as “the big box” and T which can take on string values used to represent products in an algebraic expression.

2.     is a finite alphabet. Examples are the English vocabulary (consisting of over a hundred thousand words, each treated as an atomic symbol). Another example is the printable ASCII character set. The binary alphabet consists of {0,1}. The alphabet contains the symbols from which language strings are formed.

Grammars Continued

3.     P is a finite set of productions or rules used to define the sublanguages represented by the nonterminals. In a context free grammar, a rule has the format A X where A V and X ( V )* . The interpretation is that the strings in the sublanguage represented by A can be constructed according to the format indicated by X. For a terminal character in X, the terminal character is used in the A string and for a variable in X, a string in the sublanguage is substituted for the variable. Examples are

<noun phrase> <determiner> <adj-list> <noun> and

T a * T.

4.      S is a designated variable (referred to as the start symbol or the head of the language). It represents the language being defined by the grammar G.

Grammar Examples

• Signed and unsigned integers• Unparethesized expressions with variable

operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d - digit

• English sentences with structure <noun phrase><verb phrase><noun phrase> with– Lexical categories : d – determiner, a –

adjective, n – noun, x – adverb, v - verb

Grammar Examples• Signed and unsigned integers

– I SD, S + | - | , D dD, D d

• Unparethesized expressions with variable operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d – digit– E VE, E V, V lU, U lU, U dU, U

• English sentences with structure <noun phrase><verb phrase><noun phrase> with– Lexical categories : d – determiner, a – adjective, n – noun, x –

adverb, v - verb

Grammars and Derivations

Derivations If u,v are strings in ( V )* , A is in V and A X is in P, then uAv uXv , referred to as uAv “derives” uXv by application of the rule A X. For repeated applications of 0 or more rules, the symbol * is used.

 

Language Definition The language L(G) defined by G is

{ x | x *, S * x }

Language Definition

Language Definition is a means of specifying which strings belong to the language. Two approaches to language definition are

• Acceptive – Given a string, a device specifies whether or not it belongs to the language.

• An automaton A which processes a language string x accepts x as belonging to the language if it’s final state belongs to set of legal final states.

• A parser constructed from the grammar defining the language accepts the string if it can parse it.

• Generative – Given an alphabet, a generative device tells how strings in the language are formed

• A language manual which tells how strings are formed can be used to generate language strings.

• A grammar is a generative means of specification. Any string which can be derived from the start symbol by applying gramar rules is in the language.

Grammars and Derivations

Derivations If u,v are strings in ( V )* ,

• A is in V and

• A X is in P,

• then uAv uXv , referred to as uAv “derives” uXv by application of the rule A X.

• For repeated applications of 0 or more rules, the symbol * is used.

 

Language Definition The language L(G) defined by G is

{ x | x *, S * x }

Finite state automata and language recognition

 

S

I

D

d d

d

·

Finite state automaton has = {d,•} , start state S and legal final states I and D. The transition function is represented by above diagram or table below:

d •

S I F

I I D

F D

D D -

Accepts : ddd, d.dd, .ddd

Rejects d.dd.d

·F

d

Automata as Acceptors

S

I

D

d d

d

··

F

d

The string• ddd.d produces the state sequence : SIIIDD is accepted in L because the last state D is a legal final state.

The string• .dd produces the state sequence : SFD is accepted because D is legal.

The string• ddd produces the state sequence : SIII is accepted because I is legal

Parsing

• Given a Grammar G with distinguished nonterminal S and a string X over the alphabet, does S * X?

• Parsing attempts to find a sequence of rules by which

– S * X

Grammar for Decimal Numbers

I d I

I d

I • D

D d D

D d

Parse tree for d d . d d d

I

d I

d I

• D

d D

d D

d

A parse tree has intermediate nodes for nonterminals, a child node for each RHS character in the production used to replace the nonterminal, a leaf node for each character in the language string produced by the derivation. The language is the set of strings for which there exist parse trees.