20
Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions 2007 년 3 년 30 년 년년년년년 년년년년년년년 년년년 Text : Speech and Language Processing Page. 21 ~ 33

Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

  • Upload
    marnie

  • View
    68

  • Download
    2

Embed Size (px)

DESCRIPTION

Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions. 2007 년 3 월 30 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 21 ~ 33. Outline. Introduction Basic Regular Expression Patterns Disjunction, Grouping, and Precedence A Simple Example - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Chapter 2. Regular Expressions and Automata2.1 Regular Expressions

2007 년 3 월 30 일

부산대학교 인공지능연구실 김민호

Text : Speech and Language ProcessingPage. 21 ~ 33

Page 2: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Outline

Introduction Basic Regular Expression Patterns Disjunction, Grouping, and Precedence A Simple Example A More Complex Example Advanced Operators Regular Expression Substitution, Memory, and

ELIZA

Page 3: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Introduction

One of the unsung successes in standardization in computer science

a language for specifying text search strings an algebraic notation for characterizing a set of

strings regular expression search

requires a pattern that we want to search function will search through the corpus returning all texts

that contain the pattern

3 / 20

Page 4: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Basic Regular Expression Patterns (1/6)

metacharacter the slash /

metacharacter the square bracket [ ]

4/ 20

Page 5: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Basic Regular Expression Patterns (2/6)

metacharacter the dash – / [123456789] /

/ [1-9] / / [ABCDEFGHIJKLMNOPQRSTUVWXYZ] /

/ [A-Z] /

5 / 20

Page 6: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Basic Regular Expression Patterns (3/6)

metacharacter the caret ^

6 / 20

Page 7: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Basic Regular Expression Patterns (4/6)

metacharacter the question-mark ?

Kleene * zero or more occurrences of the immediately previous char

acter or regular expression /a*/ means ‘any string of zero or more as’ /aa*/ means ‘one or more as’ /[ab]*/ means ‘zero or more as or bs’

7 / 20

Page 8: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Basic Regular Expression Patterns (5/6)

Kleene + one or more of the previous character /baaa*!/ = /baa+!/

metacharacter period . (wildcard expression) /beg.n/

- any character between beg and n- begin, beg’n, begun

.*- any string o fcharacters

8 / 20

Page 9: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Basic Regular Expression Patterns (6/6)

Anchor special metacharacter caret ^ matches the start of a line dollar sign $ matches the end of line

- / ^The dog\.$/ matches a line that contains only the phrase The dog.

\b matches a word boundary

/the/ VS /\bthe\b/ there

/ ^ $/

9 / 20

Page 10: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Disjunction, Grouping, and Precedence

We can’t use the [] to search for “cat or dog” metacharater pipe symbol |

/cat | dog/ matches either cat or the string dog

How can I specify both guppy and guppies? /guppy|ies/ sequences like guppy take precedence over the | /guppy(y|ies)/

10 / 20

Page 11: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Disjunction, Grouping, and Precedence

operator precedence hierarchy

11 / 20

Page 12: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

A simple Example

to write a RE to find cases of the English article the / the /

this pattern will miss the word when it begins a sentencc and hence is capitalized (i.e., The) / [tT]he /

the embedded in other words (e.g., other or theology) / \b[tT]he\b / / [^a-zA-Z] [tT]he [^a-zA-Z] / (^|/ [^a-zA-Z]) [tT]he [^a-zA-Z] /

12 / 20

Page 13: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

A More Complex Example (1/2)

"any PC with more than 500 MHz and 32 Gb of disk space for less than $l000”

regular expression for prices (e.g., $999.99) simple regular expression for prices

- / $ [0-9] + / to deal with fractions of dollars

- / $ [0-9] + \. [0-9] [0-9] /

- this pattern only allows $199.99 but not $199

- / \b $ [0-9] + ( \. [0-9] [0-9] )? \b /

13 / 20

Page 14: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

A More Complex Example (2/2)

regular expression for processor speed

regular expression operating systems and vendors

14 / 20

Page 15: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Advanced Operators (1/3)

Aliases for common sets of characters

15 / 20

Page 16: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Advanced Operators (2/3)

Regular expression operators for counting

/ a \.{24} z /

16 / 20

Page 17: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Advanced Operators (3/3)

Some characters that need to be backslashes

17 / 20

Page 18: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Regular Expression Substitution, Memory, and ELIZA(1/2)

Perl substitution operator s / regexp1 / regexp2 / s / colour / color /

number operator (using memory) changing the 35 boxes to <35> boxes

- s / ([0 - 9] +) / <\1> / /the (.*)er they were, the \ler they will be/

- will match The bigger they well be, the bigger they were

- but not The bigger they well be, the faster they were these numbered memories are called resisters “extended” feature of regular expressions

18 / 20

Page 19: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Regular Expression Substitution, Memory, and ELIZA(2/2)

number operator (Cont’) /the (.*)er they (.*), the \ler they \2/ will match The bigger they were, the bigger they were but not The bigger they were, the bigger they will be

ELIZA simple natural-language understanding program (1966) substitution using memory

19 / 20

Page 20: Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Regular Expression Substitution, Memory, and ELIZA(3/3)

20 / 20