Upload
truongkiet
View
222
Download
0
Embed Size (px)
Citation preview
Regular Languages are closed
under all of the following operations:
Union: A ∪∪∪∪ B = { w | w ∈∈∈∈ A or w ∈∈∈∈ B }
Intersection: A ∩∩∩∩ B = { w | w ∈∈∈∈ A and w ∈∈∈∈ B }
Complement: ¬¬¬¬A = { w ∈ Σ* | w ∉∉∉∉ A }
Reverse: AR = { w1 …wk | wk …w1 ∈∈∈∈ A }
Concatenation: A ⋅⋅⋅⋅ B = { vw | v ∈∈∈∈ A and w ∈∈∈∈ B }
Star: A* = { w1 …wk | k ≥ 0 and each wi ∈∈∈∈ A }
Regular Expressions
Computation as simple, logical description
A totally different way of thinking about computation:
What is the complexity of
describing the strings in the language?
Inductive Definition of Regexp
For all σσσσ ∊∊∊∊ Σ, σσσσ is a regexp
ε is a regexp
∅∅∅∅ is a regexp
If R1 and R2 are both regexps, then
(R1R2), (R1 + R2), and (R1)* are regexps
Let Σ be an alphabet. We define the regular
expressions over Σ inductively:
Definition: Regexps Represent Languages
The regexp σσσσ ∊∊∊∊ Σ represents the language {σσσσ}
The regexp ε represents {ε}
The regexp ∅∅∅∅ represents ∅∅∅∅
If R1 and R2 are regular expressions
representing L1 and L2 then:
(R1R2) represents L1 ⋅⋅⋅⋅ L2
(R1 + R2) represents L1 ∪∪∪∪ L2
(R1)* represents L1*
Example: (10 + 0*1) represents {0k1 | k ≥ 0} ∪∪∪∪ {10}
Regexps Represent Languages
For every regexp R, define L(R) to be the
language that R represents
A string w ∊∊∊∊ Σ* is accepted by R
(or, w matches R) if w ∊ ∊ ∊ ∊ L(R)
Example: 01010 matches the regexp (01)*0
Given any regexp R, we will construct an NFA N s.t.
N accepts exactly the strings accepted by R
Proof by induction on the length of the regexp R:
Base Cases (R has length 1):
R = σσσσσσσσ
R = ε
R = ∅∅∅∅
Induction Step: Suppose every regexp of length < k
represents some regular language.
Three possibilities for R:
R = R1 + R2
R = R1 R2
R = (R1)*
Consider a regexp R of length k > 1
Induction Step: Suppose every regexp of length < k
represents some regular language.
Three possibilities for R:
R = R1 + R2
R = R1 R2
R = (R1)*
Consider a regexp R of length k > 1
By induction, R1 and R2 represent
some regular languages, L1 and L2
But L(R) = L(R1 + R2) = L1 ∪∪∪∪ L2
so L(R) is regular, by the union theorem!
Induction Step: Suppose every regexp of length < k
represents some regular language.
Three possibilities for R:
R = R1 + R2
R = R1 R2
R = (R1)*
Consider a regexp R of length k > 1
By induction, R1 and R2 represent
some regular languages, L1 and L2
But L(R) = L(R1····R2) = L1····L2
so L(R) is regular by the concatenation
theorem
Induction Step: Suppose every regexp of length < k
represents some regular language.
Three possibilities for R:
R = R1 + R2
R = R1 R2
R = (R1)*
Consider a regexp R of length k > 1
By induction, R1 and R2 represent
some regular languages, L1 and L2
But L(R) = L(R1*) = L1*
so L(R) is regular, by the star theorem
Induction Step: Suppose every regexp of length < k
represents some regular language.
Three possibilities for R:
R = R1 + R2
R = R1 R2
R = (R1)*
Consider a regexp R of length k > 1
By induction, R1 and R2 represent
some regular languages, L1 and L2
But L(R) = L(R1*) = L1*
so L(R) is regular, by the star theorem
Therefore: If L is represented by a regexp,
then L is regular
Give an NFA that accepts the language
represented by (1(0 + 1))*
1ε 1,0
ε
1 (0+1)( )*Regular expression:
L can be represented by a regexp
⇒⇒⇒⇒
L is a regular language
⇐⇐⇐⇐
Idea: Transform an NFA for L into a regular
expression by removing states and
re-labeling the arcs with regular expressions
Generalized NFAs (GNFA)
Rather than reading in just 0 or 1 letters from the
string on a step, we can read in entire substrings
Q, Σ are states and alphabet
R : (Q-{qaccept}) ×××× (Q-{qstart}) → RRRR
is the transition function
qstart ∈∈∈∈ Q is the start state
R R R R = set of all regular expressions over Σ
A GNFA is a 5-tuple G = (Q, Σ, R, qstart, qaccept)
qaccept ∈∈∈∈ Q is the (unique) accept state
Let w ∈ Σ* and let G be a GNFA.
G accepts w if w can be written as w = w1 LLLL wk
where wi ∈∈∈∈ Σ* and there is a sequence
r0, r1, ..., rk ∈∈∈∈ Q such that
• r0 = qstart
• wi matches R(ri-1, ri) for all i = 1, ..., k, and
• rk ==== qaccept
A GNFA is a 5-tuple G = (Q, Σ, R, qstart, qaccept)
L(G) = set of all strings that G accepts
= “the language recognized by G”
q1
cb
aa*bq0 q2
This GNFA recognizes L(a*b(cb)*a)
Is aaabcbcba accepted or rejected?
Is bba accepted or rejected?
Is bcba accepted or rejected?
Generalized NFA (GNFA)
NFAε
ε
ε
ε
ε
While the machine has more than 2 states:
Pick an internal state, rip it out and
re-label the arrows with regexps,
to account for paths through the missing state
0
1
0
NFAε
ε
ε
ε
ε
While the machine has more than 2 states:
Pick an internal state, rip it out and
re-label the arrows with regexps,
to account for paths through the missing state
01*0
NFAε
ε
ε
ε
ε
In general:
R(q1,q2)
R(q2,q2)
R(q2,q3)q1
q2 q3
G
R(q1,q3)
While the machine has more than 2 states:
NFAε
ε
ε
ε
ε
In general:
R(q1,q2)R(q2,q2)*R(q2,q3) + R(q1,q3)
q1q3
G
While the machine has more than 2 states:
Formally: Given a DFA, add qstart and qacc to create G
CONVERT(G): (Takes a GNFA, outputs a regexp)
If #states = 2 return R(qstart, qacc)
If #states > 2
select qrip∈∈∈∈Q different from qstart and qacc
define Q′′′′ = Q – {qrip} defines a
new GNFA G′′′′define R′′′′ on Q’-{qacc} x Q’-{qstart} as:
R′′′′(qi,qj) = R(qi,qrip)R(qrip,qrip)*R(qrip,qj) + R(qi,qj)
For all q,q’, define R(q,q’) to be σ if δ(q,σ) = q’, else ∅
return CONVERT(G′′′′) Claim:
L(G’) = L(G)
Theorem: Let R = CONVERT(G). Then L(R) = L(G).
Proof by induction on k, the number of states in G
Base Case: k = 2
Inductive Step:
Assume theorem is true for k-1 state GNFAs
Let G have k states. Let G’ be the k-1 state GNFA
obtained by ripping out a state.
CONVERT outputs R(qstart, qacc) �
G’ has k-1 states, so by induction,
L(G′′′′) = L(CONVERT(G’)) = L(R)
Therefore L(R)=L(G). QED
We already claimed L(G) = L(G′′′′) [Sipser, p.73--74]
Convert the NFA to a regular expression
q1
ε
ε
(a + b)b*b(bb*b)*
(a + b)b*b(bb*b)*a
((a + b)b*b(bb*b)*a)*(ε + (a + b)b*b(bb*b)*)
Regular or Not?
C = { w | w has equal number of 1s and 0s}
D = { w | w has equal number of
occurrences of 01 and 10 }
NOT REGULAR!
REGULAR!
1 + 0 + ε + 0(0+1)*0 + 1(0+1)*1
{ w | w has equal number of
occurrences of 01 and 10}
= { w | w = 1, w = 0, or w = ε, or
w starts with a 0 and ends with a 0, or
w starts with a 1 and ends with a 1 }
Claim:
A string w has equal occurrences of 01 and 10
� w starts and ends with the same bit.
The Pumping Lemma:
Structure in Regular Languages
Let L be a regular language
Then there is a positive integer P s.t.
1. |y| > 0 (that is, y ≠≠≠≠ ε)
2. |xy| ≤ P
3. For all i ≥ 0, xyiz ∈∈∈∈ L
for all strings w ∈∈∈∈ L with |w| ≥ P
there is a way to write w = xyz, where:
Why is it called the pumping lemma? The word w gets
pumped into longer and longer strings…