Upload
brice-darcy-simpson
View
290
Download
13
Embed Size (px)
Citation preview
String把字符按顺序连起来,称为 string.
一般是有限长度可以是 1 个,两个,…,或者 0 个( ε, “” ,空字符
串)比如:
“abcdaaab”
在 ES 中,字符串用单引号或双引号括起来。
Algebra of Set<string>|
相当于集合的并集,结果仍是 Set<string>Ф=The Union of Zero Set<string>
{“a”,”bc”,”e”} | {“a”,”1”}={“a”,”bc”,”e”,”1”}
• Many use ∪, +, or ∨ for alternation
类似于 Cartesian Product 笛卡尔乘积 , 结果仍是Set<string>比如:{“a”,”bc”,”e”} {“a”,”1”}=
{“aa”,”a1”,”bca”,”bc1”,”ea”,”e1”}
乘方{“a”,”1”} 2 ={“a”,”1”} {“a”,”1”}
={“aa”,”a1”,”1a”,”11”}{“a”,”1”}3={“a”,”1”} {“a”,”1”} {“a”,”1”}
={“aaa”,”aa1”,”a1a”,”a11”,”1aa”,”1a1”,”11a”,”111”}
定义{”a”,”1”}1={“a”,”1”}
这样 {“a”,”1”} 2= {”a”,”1”}1+1= {”a”,”1”} {”a”,”1”}{”a”,”1”}0={ε}
这样, {”a”,”1”}1= {”a”,”1”}0+1={ε} {”a”,”1”}
乘和或 复合S{m,n} =Sm | Sm+1 | Sm+2… | Sn
S{m,} =Sm | Sm+1 | Sm+2… S ? =S0 |SS+ = S1 | S2 | S3…
S* =S0 | S1 | S2 | S3…* is called Kleene Star
Priority of ops* highestConcatenationalternation. parentheses may be omitted. For example,
(ab)c can be written as abc, and a|(b(c*)) can be written as a|bc*.
Regular ExpressionSome set<string> is called regular
expression, or RE, RegExp, RegexThe following are RegExp
{“a”} is regular expression, for any char in alphabet
RS is RegExp, if R and S are both RegexR* is RegExp, if R is Regex
So {ε} is RegExpIf a Set<string> cannot be represented by
above process, it’s not RegExp
\c followed by lower or upper letter \a =a
For a is not designated special meaningsSo are some other letters
\u002F \0
Character Class[][abd]={“a”,”b”,”c”}[a-c] = {“a”,”b”,”c”}
[-ca] where – is literal[ac-] where – is literal
[^a-c] = alphabet / [a-c]
. Any char except newline\d digit\D not digit\w word char\W not word char\s whitespace\S not whitespace
RegExp.RegExp is a function
Can construct Regular Expressions RegExp(pattern, flags) new RegExp(pattern, flags)
RegExp.prototype
RegExp.prototype.constructorexec
Return matches, an array Ordered by the appearance of ( There is one implicit () around the whole pattern
testReturn bool
toStringReturn string
Members of RegExp instancesourceglobalignoreCasemultilinelastIndex integer
{ [[Writable]]: true, [[Enumerable]]: false, [[Configurable]]: false }.
<ZWNJ> and <ZWJ> are format-control characters that are used to make necessary distinctions when forming words or phrases in certain languages.
The Unicode format-control characters (i.e., the characters in category ―Cf‖ in the Unicode Character Database such as LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) are control codes used to control the formatting of a range of text in the absence of higher-level protocols for this (such as mark-up languages).
All format control characters may be used within comments, and within string literals and regular expression literals.
In ECMAScript source text, <ZWNJ> and <ZWJ> may also be used in an identifier after the first character.
<BOM> is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. <BOM> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. <BOM> characters are treated as white space characters (see 7.2).
The special treatment of certain format-control characters outside of comments, string literals, and regular expression literals is summarised in Table 1.
Table 1 — Format-Control Character Usage Code
Unit Value Name Formal Name Usage
\u200C Zero width non-joiner <ZWNJ> IdentifierPart
\u200D Zero width joiner <ZWJ> IdentifierPart
\uFEFF Byte Order Mark <BOM> Whitespace