18
Languages & Strings String Operations Language Definitions

Languages & Strings

  • Upload
    cathal

  • View
    27

  • Download
    1

Embed Size (px)

DESCRIPTION

Languages & Strings. String Operations Language Definitions. Strings. A string x (over alphabet A) is a finite sequence x = x 1 x 2 .. x n where x i  A. Length – the length of x is the number of characters, n, in the sequence. Empty String – λ denotes the empty string of length 0. - PowerPoint PPT Presentation

Citation preview

Page 1: Languages & Strings

Languages & Strings

String Operations

Language Definitions

Page 2: Languages & Strings

Strings

• A string x (over alphabet A) is a finite sequence x = x1x2 .. xn where xi A.

• Length – the length of x is the number of characters, n, in the sequence.

• Empty String – λ denotes the empty string of length 0.

• Recursive definition of the set of strings A* over alphabet A– Basis : The empty string λ A*

– Recursive Step : If x A* and a A, then xa A*

– Closure : A* contains no other strings

Page 3: Languages & Strings

Languages and String Operations

• Languages– A language L over alphabet A is any subset of A*

– Concatenation : The concatenation of two strings x, y is xy, a string of length of x + length of y.

– The concatenation of two languages : The concatenation of two languages L and M is LM, where LM = { z | z = xy where x L, y M.

• Example: T = D* and O = {“+”,”-”} where D = {0,1,..,9}. Then TOT is the language {“1+1, 12+24, . . .}

Page 4: Languages & Strings

Recursive Definition of Regular Sets

• Let A be an alphabet. The regular sets over A are:– Basis : , {λ} and {a} are each regular sets– Recursive Step : If X, Y are regular sets, so is

• X Y• XY• X*

– Closure : X is a regular set over A iff it can be obtained by a finite number of applications of the recursive step

Page 5: Languages & Strings

Regular Set Examples

• Signed and unsigned integers• Unparethesized expressions with variable

operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d - digit

• English sentences with structure <noun phrase><verb phrase><noun phrase> with– Lexical categories : d – determiner, a –

adjective, n – noun, x – adverb, v - verb

Page 6: Languages & Strings

Regular Set Examples

• Signed and unsigned integers– ({} {+} {-}){d}{d}*

• Expressions without parentheses– (({l}({l} {d})*)(({+} {*})(({l}({l} {d})*))*

• Sentences– ({d}{a}*{n})(({} {x}){v})({d}{a}*{n})

Page 7: Languages & Strings

Regular Expressions• The set of strings which begin with an “a” and end

with a “b” is a regular set over {a,b} since it equals {a}({a} {b})*{b}.

• Regular expressions represent regular sets as follows: , λ and a represent , {λ} and {a}.– If u and v are regular expressions (representing

reguar sets) then (u v), (uv) and (u*) are regular expressions representing their union, concatenation and Kleene closure.

– Dropping superfluous parentheses, a(a,b)*b represents the regular set: all strings starting with a and ending with b.

Page 8: Languages & Strings

Grammars

A context free grammar G is a 4-tuple : G = ( V,,P,S ) where

1.     V is a set of nonterminals (or string variables), each representing a sublanguage from which the variable takes its values. Examples are <noun phrase> which can take on values such as “the big box” and T which can take on string values used to represent products in an algebraic expression.

2.     is a finite alphabet. Examples are the English vocabulary (consisting of over a hundred thousand words, each treated as an atomic symbol). Another example is the printable ASCII character set. The binary alphabet consists of {0,1}. The alphabet contains the symbols from which language strings are formed.

Page 9: Languages & Strings

Grammars Continued

3.     P is a finite set of productions or rules used to define the sublanguages represented by the nonterminals. In a context free grammar, a rule has the format A X where A V and X ( V )* . The interpretation is that the strings in the sublanguage represented by A can be constructed according to the format indicated by X. For a terminal character in X, the terminal character is used in the A string and for a variable in X, a string in the sublanguage is substituted for the variable. Examples are

<noun phrase> <determiner> <adj-list> <noun> and

T a * T.

4.      S is a designated variable (referred to as the start symbol or the head of the language). It represents the language being defined by the grammar G.

Page 10: Languages & Strings

Grammar Examples

• Signed and unsigned integers• Unparethesized expressions with variable

operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d - digit

• English sentences with structure <noun phrase><verb phrase><noun phrase> with– Lexical categories : d – determiner, a –

adjective, n – noun, x – adverb, v - verb

Page 11: Languages & Strings

Grammar Examples• Signed and unsigned integers

– I SD, S + | - | , D dD, D d

• Unparethesized expressions with variable operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d – digit– E VE, E V, V lU, U lU, U dU, U

• English sentences with structure <noun phrase><verb phrase><noun phrase> with– Lexical categories : d – determiner, a – adjective, n – noun, x –

adverb, v - verb

Page 12: Languages & Strings

Grammars and Derivations

Derivations If u,v are strings in ( V )* , A is in V and A X is in P, then uAv uXv , referred to as uAv “derives” uXv by application of the rule A X. For repeated applications of 0 or more rules, the symbol * is used.

 

Language Definition The language L(G) defined by G is

{ x | x *, S * x }

Page 13: Languages & Strings

Language Definition

Language Definition is a means of specifying which strings belong to the language. Two approaches to language definition are

• Acceptive – Given a string, a device specifies whether or not it belongs to the language.

• An automaton A which processes a language string x accepts x as belonging to the language if it’s final state belongs to set of legal final states.

• A parser constructed from the grammar defining the language accepts the string if it can parse it.

• Generative – Given an alphabet, a generative device tells how strings in the language are formed

• A language manual which tells how strings are formed can be used to generate language strings.

• A grammar is a generative means of specification. Any string which can be derived from the start symbol by applying gramar rules is in the language.

Page 14: Languages & Strings

Grammars and Derivations

Derivations If u,v are strings in ( V )* ,

• A is in V and

• A X is in P,

• then uAv uXv , referred to as uAv “derives” uXv by application of the rule A X.

• For repeated applications of 0 or more rules, the symbol * is used.

 

Language Definition The language L(G) defined by G is

{ x | x *, S * x }

Page 15: Languages & Strings

Finite state automata and language recognition

 

S

I

D

d d

d

·

Finite state automaton has = {d,•} , start state S and legal final states I and D. The transition function is represented by above diagram or table below:

d •

S I F

I I D

F D

D D -

Accepts : ddd, d.dd, .ddd

Rejects d.dd.d

·F

d

Page 16: Languages & Strings

Automata as Acceptors

S

I

D

d d

d

··

F

d

The string• ddd.d produces the state sequence : SIIIDD is accepted in L because the last state D is a legal final state.

The string• .dd produces the state sequence : SFD is accepted because D is legal.

The string• ddd produces the state sequence : SIII is accepted because I is legal

Page 17: Languages & Strings

Parsing

• Given a Grammar G with distinguished nonterminal S and a string X over the alphabet, does S * X?

• Parsing attempts to find a sequence of rules by which

– S * X

Page 18: Languages & Strings

Grammar for Decimal Numbers

I d I

I d

I • D

D d D

D d

Parse tree for d d . d d d

I

d I

d I

• D

d D

d D

d

A parse tree has intermediate nodes for nonterminals, a child node for each RHS character in the production used to replace the nonterminal, a leaf node for each character in the language string produced by the derivation. The language is the set of strings for which there exist parse trees.