??xml version="1.0" encoding="utf-8" standalone="yes"?>
Appel, 2nd edition, page 484, describes comments in MiniJava as being nestable. This is an interesting exercise for the scanner, but is not required.
]]>
]]>
]]>
]]>
http://www.cnblogs.com/RicCC/archive/2008/03/17/antlr-notepad.html
http://llf.javaeye.com/blog/157507
http://llf.javaeye.com/blog/156170
]]>
Goal
MainClass ClassDeclaration EOF
MainClass
"class" Identifier "{" "public" "static" "void" "main" "(" "String" "[" "]" Identifier ")" "{" Statement "}" "}"
ClassDeclaration
"class" Identifier "extends" Identifier "{" VarDeclaration MethodDeclaration "}"
VarDeclaration
Type Identifier ";"
MethodDeclaration
"public" Type Identifier "(" Type Identifier "," Type Identifier ")" "{" VarDeclaration Statement "return" Expression ";" "}"
Type
"int" "[" "]"
"boolean"
"int"
Identifier
Statement
"{" Statement "}"
"if" "(" Expression ")" Statement "else" Statement
"while" "(" Expression ")" Statement
"System.out.println" "(" Expression ")" ";"
Identifier "=" Expression ";"
Identifier "[" Expression "]" "=" Expression ";"
Expression
Expression "&&" | "<" | "+" | "-" | "*" Expression
Expression "[" Expression "]"
Expression "." "length"
Expression "." Identifier "(" Expression "," Expression ")"
IntegerLiteral
"true"
"false"
Identifier
"this"
"new" "int" "[" Expression "]"
"new" Identifier "(" ")"
"!" Expression
"(" Expression ")"
Identifier
one or more letters, digits, and underscores, starting with a letter
IntegerLiteral
one or more decimal digits
EOF
a distinguished token returned by the scanner at end-of-file
Comments
Comments are // to end of line and /* ... */, just as in Java. The /* ... */ comments do not nest in Java. For example,
/*
One commment
/* Nested comment */
Bad things will happen
*/
The second /* will be ignored (it is in a comment), and the first */ will terminate the comment. Now, "bad things will happen" as the remaining text is not a comment.
EBNF
ISO/IEC 14977: 1996(E)
]]>
]]>
SET_TYPES = { "bool", "char", "float", "int", "string" }
SET_LITERALS = { <boolean_literal>, <char_literal>, <float_literal>, <int_literal>, <string_literal> }
SET_INSTRUCTIONS = { "label", "break", "continue", "if", "goto", "while", "do", "switch", "return", ";" }
SET_UNARIES = { "&", "*", "~", "+", "-", "!" }
module: "module" <identifier> ";" globals.
globals: e.
globals: global globals.
globals: "extern" global globals.
global: function.
global: declaration.
function: functionheader functionrest.
functionheader: modifiers <identifier> ":" paramlist "->" returntype.
functionrest: ";".
functionrest: block.
modifiers: e.
modifiers: "start".
paramlist: "void".
paramlist: paramblock moreparamblocks.
moreparamblocks: e.
moreparamblocks: ";" paramblock moreparamblocks.
paramblock: type param moreparams.
moreparams: e.
moreparams: "," param moreparams.
param: reference <identifier> dimensionblock.
returntype: type reference dimensionblock.
reference: e.
reference: "*" reference.
dimensionblock: e.
dimensionblock: "[" "]" dimensionblock.
block: "{" code "}".
code: e.
code: block code
code: statement code.
statement: "label" <identifier> ";"
statement: ";"
statement: "break" ";"
statement: "continue" ";"
statement: expression ";"
statement: declarationblock ";"
statement: "if" "(" expression ")" block elseblock
statement: "goto" <identifier> ";"
statement: "while" "(" expression ")" "do" block
statement: "do" block "while" "(" expression ")" ";"
statement: "switch" "(" expression ")" "{" switchcases "default" block "}"
statement: "return" returnarg ";".
returnarg: "(" expression ")".
returnarg: e.
elseblock: e.
elseblock: "else" block.
switchcases: e.
switchcases: "case" <int_literal> block swithcases.
declarationblock: type declaration restdeclarations.
restlocals: e.
restlocals: "," declaration restdeclarations.
local: reference <identifier> indexblock initializer.
indexblock: e.
indexblock: "[" <int_literal> "]" indexblock.
initializer: e.
initializer: "=" expression.
expression: logicalor restexpression.
restexpression: e.
restexpression: "=" logicalor restexpression.
logicalor: logicaland restlogicalor.
restlogicalor: e.
restlogicalor: "||" logicaland restlogicalor.
logicaland: bitwiseor restlogicaland.
restlogicaland: e.
restlogicaland: "&&" bitwiseor restlogicaland.
bitwiseor: bitwisexor restbitwiseor.
restbitwiseor: e.
restbitwiseor: "|" bitwisexor restbitwiseor.
bitwisexor: bitwiseand restbitwisexor.
restbitwisexor: e.
restbitwisexor: "^" bitwiseand restbitwisexor.
bitwiseand: equality restbitwiseand.
restbitwiseand: e.
restbitwiseand: "&" equality restbitwiseand.
equality: relation restequality.
restequality: e.
restequality: equalityoperator relation restequality.
equalityoperator: "==".
equalityoperator: "!=".
relation: shift restrelation.
restrelation: e.
restrelation: relationoperator shift restrelation.
relationoperator: "<".
relationoperator: "<=".
relationoperator: ">".
relationoperator: ">=".
shift: addition restshift.
restshift: e.
restshift: shiftoperator addition restshift.
shiftoperator: "<<".
shiftoperator: ">>".
addition: multiplication restaddition.
restaddition: e.
restaddition: additionoperator multiplication restaddition.
additionoperator: "+".
additionoperator: "-".
multiplication: unary3 restmultiplication.
restmultiplication: e.
restmultiplication: multiplicationoperator unary3 restmultiplication.
multiplicationoperator: "*".
multiplicationoperator: "/".
multiplicationoperator: "%".
unary3: unary2
unary3: unary3operator unary3.
unary3operator: "&".
unary3operator: "*".
unary3operator: "~".
unary2: factor.
unary2: unary2operator unary2.
unary2operator: "+".
unary2operator: "-".
unary2operator: "!".
factor: <identifier> application.
factor: immediate.
factor: "(" expression ")".
application: e.
application: "[" expression "]" application.
application: "(" expression moreexpressions ")".
moreexpressions: e.
moreexpressions: "," expression morexpressions.
type: "bool".
type: "char".
type: "float".
type: "int".
type: "string".
immediate: <boolean_literal>.
immediate: <char_literal>.
immediate: <float_literal>.
immediate: <int_literal>.
immediate: <string_literal>.
module: "module" <identifier> ";" {[extern] global}.
global: function | declaration.
function: functionheader [ ";" | block ].
functionheader: ["start"] <identifier> ":" paramlist "->" returntype.
paramlist: "void" | paramblock {";" paramblock}.
paramblock: type param {"," param}.
param: {"*"} <identifier> {"[" "]"}.
returntype: type {"*"} {"[" "]"}.
block: "{" { statement | block } "}".
statement: "label" <identifier> ";"
statement: ";"
statement: "break" ";"
statement: "continue" ";"
statement: expression ";"
statement: declarationblock ";"
statement: "if" "(" expression ")" block [ "else" block ].
statement: "goto" <identifier> ";"
statement: "while" "(" expression ")" "do" block
statement: "do" block "while" "(" expression ")" ";"
statement: "switch" "(" expression ")" "{" { "case" <int_literal> block } "default" block "}"
statement: "return" "(" expression ")"";".
declarationblock: type declaration {"," declaration}.
local: {"*"} <identifier> {"[" <int_literal> "]"} [ "=" expression ].
expression: logicalor {"=" logicalor expression}.
logicalor: logicaland {"||" logicaland logicalor}.
logicaland: bitwiseor {"&&" bitwiseor logicaland}.
bitwiseor: bitwisexor {"|" bitwisexor bitwiseor}.
bitwisexor: bitwiseand {"^" bitwiseand bitwisexor}.
bitwiseand: equality {"&" equality bitwiseand}.
equality: relation {("==" | "!=") relation equality}.
relation: shift {("<" | "<=" | ">" | ">=") shift relation}.
shift: addition {("<<" | ">>") addition shift}.
addition: multiplication {("+" | "-") multiplication addition}
multiplication: unary3 {("*" | "/" | "%") unary3 multiplication}.
unary3: {("&" | "*" | "~")} unary2.
unary2: {("+" | "-" | "!")} factor.
factor: <identifier> [application] | immediate | "(" expression ")".
application: "[" expression "]" application | "(" expression {"," expression } ")".
type: "bool" | "char" | "float" | "int" | "string".
immediate: <boolean_literal> | <char_literal> | <float_literal> | <int_literal> | <string_literal>.
/* Call the preprocessor. It will store its result in
* preprocessorFilename. If the preprocessor could not
* open the input file, skip this file.
* 建立一个新文gQ文件名为原来的.i文g加上.p,如果输入?br> 文g名是while.i,则生成的l过预处理的文g名ؓwhile.i_pQ?br> q里说的预处理和c语言是一LQ即相应的头文件拷贝过来,
?import "printint.ih"Q则?import "printint.ih"替换?br> printint.ih文g的内?br> */
result = Preprocess( argv[optind], preprocessorFilename );
词法分析Q?br>关键字定?
/* This enum contains all the keywords and operators
* used in the language.
*/
enum
{
/* Keywords */
KW_BREAK = 1000, /* "break" keyword */
KW_CASE, /* "case" keyword */
KW_CONTINUE, /* "continue" keyword */
KW_DEFAULT, /* "default" keyword */
KW_DO, /* "do" keyword */
KW_ELSE, /* "else" keyword */
KW_EXTERN, /* "extern" keyword */
KW_GOTO, /* "goto" keyword */
KW_IF, /* "if" keyword */
KW_LABEL, /* "label" keyword */
KW_MODULE, /* "module" keyword */
KW_RETURN, /* "return"keyword */
KW_START, /* "start" keyword */
KW_SWITCH, /* "switch" keyword */
KW_WHILE, /* "while" keyword */
/* Type identifiers */
KW_BOOL, /* "bool" identifier */
KW_CHAR, /* "char" identifier */
KW_FLOAT, /* "float" identifier */
KW_INT, /* "int" identifier */
KW_UNTYPED, /* "untyped" identifier */
KW_VOID, /* "void" identifier */
/* Variable lexer tokens */
LIT_BOOL, /* bool constant */
LIT_CHAR, /* character constant */
LIT_FLOAT, /* floating point constant */
LIT_INT, /* integer constant */
LIT_STRING, /* string constant */
IDENTIFIER, /* identifier */
/* Operators */
OP_ADD, /* "+" */
OP_ASSIGN, /* "=" */
OP_BITWISE_AND, /* "&" */
OP_BITWISE_COMPLEMENT, /* "~" */
OP_BITWISE_LSHIFT, /* "<<" */
OP_BITWISE_OR, /* "|" */
OP_BITWISE_RSHIFT, /* ">>" */
OP_BITWISE_XOR, /* "^" */
OP_DIVIDE, /* "/" */
OP_EQUAL, /* "==" */
OP_GREATER, /* ">" */
OP_GREATEREQUAL, /* ">=" */
OP_LESS, /* "<" */
OP_LESSEQUAL, /* "<=" */
OP_LOGICAL_AND, /* "&&" */
OP_LOGICAL_OR, /* "||" */
OP_MODULUS, /* "%" */
OP_MULTIPLY, /* "*" */
OP_NOT, /* "!" */
OP_NOTEQUAL, /* "!=" */
OP_SUBTRACT, /* "-" */
OP_TERNARY_IF, /* "?" */
/* Delimiters */
ARROW, /* "->" */
LBRACE, /* "{" */
RBRACE, /* "}" */
LBRACKET, /* "[" */
RBRACKET, /* "]" */
COLON, /* ":" */
COMMA, /* "," */
LPAREN, /* "(" */
RPAREN, /* ")" */
SEMICOLON /* ";" */
}
tokens;
处理inger中的各种数据cd和标识符Q如BOOL, unsigned long, float, char*,标识W等
typedef union
{
unsigned long uintvalue;
BOOL boolvalue;
char *stringvalue;
char charvalue;
float floatvalue;
char *identifier;
} Tokenvalue;
树节点结?注意Q树节点和抽象语法树节点是不同的)Q?br>typedef struct TreeNode
{
void *data;
int screenX;
struct TreeNode *parent;
List *children; //一pd孩子
} TreeNode;
抽象语法树节点结构:
typedef struct AstNode
{
int id; //表示节点的类型,如while节点Qmodule节点
Tokenvalue val;
Type *type;
int lineno;
} AstNode;
抽象语法树的AstNode作ؓTreeNode的data成员保存Q参考如下函敎ͼ
//参数id表示节点名,如:NODE_MODULE,NODE_GLOBAL{见nodenames.h
TreeNode *CreateAstNode( int id, int lineno )
{
TreeNode *treeNode;
AstNode *astNode;
astNode = (AstNode *) MallocEx( sizeof( AstNode ) );
astNode->id = id;
astNode->lineno = lineno;
astNode->val.uintvalue = 0;
astNode->type = NULL;
treeNode = CreateTreeNode( astNode );
return( treeNode );
}
或者:
TreeNode *CreateAstNodeVal( int id, Tokenvalue val, int lineno )
{
TreeNode *treeNode;
AstNode *astNode;
astNode = (AstNode *) MallocEx( sizeof( AstNode ) );
astNode->id = id;
astNode->lineno = lineno;
astNode->val = val;
astNode->type = NULL;
treeNode = CreateTreeNode( astNode );
return( treeNode );
}
//抽象语法树节点名
enum NodeNames
{
NODE_MODULE = 0,
NODE_START,
NODE_EXTERN,
NODE_GLOBAL,
NODE_FUNCTION,
NODE_FUNCTIONHEADER,
NODE_MODIFIERS,
NODE_PARAMLIST,
NODE_PARAMBLOCK,
NODE_PARAM,
NODE_RETURNTYPE,
NODE_DIMENSION,
NODE_DIMENSIONBLOCK,
NODE_BLOCK,
NODE_STATEMENT,
NODE_SWITCH,
NODE_CASES,
NODE_CASE,
NODE_WHILE,
NODE_GOTO,
NODE_LABEL,
NODE_IF,
NODE_IDENT,
NODE_RETURN,
NODE_CONTINUE,
NODE_BREAK,
NODE_DECLBLOCK,
NODE_DECLARATION,
NODE_INITIALIZER,
NODE_INDEXBLOCK,
NODE_REFERENCE,
NODE_INDEX,
NODE_EXPRESSION,
NODE_LOGICAL_OR,
NODE_LOGICAL_AND,
NODE_BITWISE_OR,
NODE_BITWISE_XOR,
NODE_BITWISE_AND,
NODE_EQUAL,
NODE_NOTEQUAL,
NODE_GREATER,
NODE_GREATEREQUAL,
NODE_LESS,
NODE_LESSEQUAL,
NODE_BITWISE_LSHIFT,
NODE_BITWISE_RSHIFT,
NODE_ASSIGN,
NODE_BINARY_ADD,
NODE_BINARY_SUBTRACT,
NODE_UNARY_ADD,
NODE_UNARY_SUBTRACT,
NODE_MULTIPLY,
NODE_DIVIDE,
NODE_MODULUS,
NODE_BITWISE_COMPLEMENT,
NODE_ADDRESS,
NODE_DEREFERENCE,
NODE_NOT,
NODE_APPLICATION,
NODE_INDEXER,
NODE_ARGUMENTS,
NODE_FACTOR,
NODE_BOOL,
NODE_CHAR,
NODE_FLOAT,
NODE_INT,
NODE_UNTYPED,
NODE_VOID,
NODE_LIT_BOOL,
NODE_LIT_CHAR,
NODE_LIT_FLOAT,
NODE_LIT_INT,
NODE_LIT_STRING,
NODE_LIT_IDENTIFIER,
NODE_INT_TO_FLOAT,
NODE_CHAR_TO_INT,
NODE_CHAR_TO_FLOAT,
NODE_UNKNOWN = -1
};
输出抽象语法树:
/*
* PRINTING ROUTINES
*/
void PrintAst( TreeNode *source )
{
PrintTree( source, GetAstNodeData, 4 );
}
PrintTree的实现如下,q里传递的参数levels{于4
void PrintTree( TreeNode *source, DataFunction dataFunction, int levels )
{
int printDepth = 0;
BOOL loop;
char *str;
int i;
/* TODO: We're going to have to make a new macro.
* Don't use DEBUG for this.
*/
DEBUG( "Called\n" );
/* If tree is empty, abort. */
if( source == NULL )
{
return;
}
/* Walk through tree to determine x-offsets for
* each node.
*/
LayoutTree( source, LEFT_OFFSET );
/* Print nodes. */
依次通过调用函数指针dataFunction所指向的函?实际上是函数GetAstNodeData)
来输出每个节点的节点名,节点的token的|cd名,以及行号Q参考下面的GetAstNodeData函数
for( i = 0; i < levels; i++ )
{
str = dataFunction( source, i );
PrintChars( source->screenX - strlen( str ) / 2, ' ' );
printf( "%s\n", str );
}
PrintChars( source->screenX, ' ' );
printf( "%c", VERTBAR );
printDepth = 0;
do
{
currentX = 0;
printf("\n");
PrintNode( source, 0, printDepth, 0, dataFunction, 0 );
currentX = 0;
printf("\n");
PrintNode( source, 0, printDepth, 1, dataFunction, 0 );
currentX = 0;
printf("\n");
for( i = 0; i < levels; i++ )
{
PrintNode( source, 0, printDepth, 2, dataFunction, i );
currentX = 0;
printf("\n");
}
loop = PrintNode( source, 0, printDepth, 3, dataFunction, 0 );
printDepth++;
}
while( loop );
}
d一个孩子的操作只需要将新节Ҏ入到孩子链表N卛_
void AddTreeChild( TreeNode *parentnode, TreeNode *node )
{
/* Do not act on an empty node. */
if( node == NULL ) return;
/* If the tree is empty, add the first root node. */
if( parentnode == NULL )
{
node->parent = NULL;
}
else
/* Tree is not empty. Add the new node to [parentnode]'s
* children list. */
{
node->parent = parentnode;
ListAppend( parentnode->children, node );
}
}
而RemoveAstNode则删除整个子?br>/* Remove node [node] from ast. The node contents
* and its children get deleted.
*
* Pre: [node] is not NULL.
*/
void RemoveAstNode( TreeNode *node );
inger~译程Q?br>1. 预处理,关键函数为Preprocess
2. 语法分析Q构建抽象语法树Q关键函数ParseQParse也是构造抽象语法树的入口函敎ͼ如果语法分析没有发现错误则蟩到第3?br>3. Ҏ抽象语法树来建立W号表关键函CؓCreateSymbolTable( ast );
4. 语义分析Q关键函敎ͼ
a) CheckLeftValues( ast );
b) CheckArgCount( ast );
c) CheckSwitchStatements( ast );
d) CheckFunctionReturns( ast );
5. Ҏ抽象语法树生成代码,关键函数GenerateCode( ast );
"
^\d+$
"
//
非负整数Q正整数 + 0Q?/span>
"
^[0-9]*[1-9][0-9]*$
"
//
正整敊W?/span>
"
^((-\d+)|(0+))$
"
//
非正整数Q负整数 + 0Q?/span>
"
^-[0-9]*[1-9][0-9]*$
"
//
负整敊W?/span>
"
^-?\d+$
"
//
整数
"
^\d+(\.\d+)?$
"
//
非负点敎ͼ正QҎ + 0Q?/span>
"
^(([0-9]+\.[0-9]*[1-9][0-9]*)|([0-9]*[1-9][0-9]*\.[0-9]+)|([0-9]*[1-9][0-9]*))$
"
//
正QҎ
"
^((-\d+(\.\d+)?)|(0+(\.0+)?))$
"
//
非正点敎ͼ负QҎ + 0Q?/span>
"
^(-(([0-9]+\.[0-9]*[1-9][0-9]*)|([0-9]*[1-9][0-9]*\.[0-9]+)|([0-9]*[1-9][0-9]*)))$
"
//
负QҎ
"
^(-?\d+)(\.\d+)?$
"
//
点敊W?/span>
"
^[A-Za-z]+$
"
//
?6个英文字母组成的字符串?/span>
"
^[A-Z]+$
"
//
?6个英文字母的大写l成的字W串
"
^[a-z]+$
"
//
?6个英文字母的写l成的字W串
"
^[A-Za-z0-9]+$
"
//
由数字和26个英文字母组成的字符串?/span>
"
^\w+$
"
//
由数字?6个英文字母或者下划线l成的字W串
"
^[\w-]+(\.[\w-]+)*@[\w-]+(\.[\w-]+)+$
"
//
email地址
"
^[a-zA-z]+://(\w+(-\w+)*)(\.(\w+(-\w+)*))*(\?\S*)?$
"
//
url
/^
(d
{
2
}
|
d
{
4
}
)
-
((
0
([
1
-
9
]
{
1
}
))
|
(
1
[
1
|
2
]))
-
(([
0
-
2
]([
1
-
9
]
{
1
}
))
|
(
3
[
0
|
1
]))$
/
//
q???/span>
/^
((
0
([
1
-
9
]
{
1
}
))
|
(
1
[
1
|
2
]))
/
(([
0
-
2
]([
1
-
9
]
{
1
}
))
|
(
3
[
0
|
1
]))
/
(d
{
2
}
|
d
{
4
}
)$
/
//
??q?/span>
"
^([w-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([w-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$
"
//
Emil
"
(d+-)?(d{4}-?d{7}|d{3}-?d{8}|^d{7,8})(-d+)?
"
//
电话L
"
^(d{1,2}|1dd|2[0-4]d|25[0-5]).(d{1,2}|1dd|2[0-4]d|25[0-5]).(d{1,2}|1dd|2[0-4]d|25[0-5]).(d{1,2}|1dd|2[0-4]d|25[0-5])$
"
//
IP地址
匚w中文字符的正则表辑ּQ?[\u4e00-\u9fa5]
匚w双字节字W?包括汉字在内)Q[^\x00-\xff]
匚wI的正则表辑ּQ\n[\s| ]*\r
匚wHTML标记的正则表辑ּQ?<(.*)>.*<\/\1>|<(.*) \/>/
匚w首尾I格的正则表辑ּQ?^\s*)|(\s*$)
匚wEmail地址的正则表辑ּQ\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*
匚w|址URL的正则表辑ּQ^[a-zA-z]+://(\\w+(-\\w+)*)(\\.(\\w+(-\\w+)*))*(\\?\\S*)?$
匚w帐号是否合法(字母开_允许5-16字节Q允许字母数字下划线)Q^[a-zA-Z][a-zA-Z0-9_]{4,15}$
匚w国内电话LQ?\d{3}-|\d{4}-)?(\d{8}|\d{7})?
匚w腾讯QQP^[1-9]*[1-9][0-9]*$
下表是元字符及其在正则表辑ּ上下文中的行为的一个完整列表:
\ 下一个字W标Cؓ一个特D字W、或一个原义字W、或一个后向引用、或一个八q制转义W?br />
^ 匚w输入字符串的开始位|。如果设|了 RegExp 对象的Multiline 属性,^ 也匹?’\n??’\r?之后的位|?
$ 匚w输入字符串的l束位置。如果设|了 RegExp 对象的Multiline 属性,$ 也匹?’\n??’\r?之前的位|?
* 匚w前面的子表达式零ơ或多次?
+ 匚w前面的子表达式一ơ或多次? {h?{1,}?
? 匚w前面的子表达式零ơ或一ơ? {h?{0,1}?
{n} n 是一个非负整敎ͼ匚w定的n ơ?br />
{n,} n 是一个非负整敎ͼ臛_匚wn ơ?
{n,m} m ?n 均ؓ非负整数Q其中n <= m。最匹?n ơ且最多匹?m ơ。在逗号和两个数之间不能有空根{?br />
? 当该字符紧跟在Q何一个其他限制符 (*, +, ?, {n}, {n,}, {n,m}) 后面Ӟ匚w模式是非贪婪的。非贪婪模式可能少的匹配所搜烦的字W串Q而默认的贪婪模式则尽可能多的匚w所搜烦的字W串?
. 匚w?"\n" 之外的Q何单个字W。要匚w包括 ’\n?在内的Q何字W,请用象 ’[.\n]?的模式?
(pattern) 匚wpattern q获取这一匚w?
(?:pattern) 匚wpattern 但不获取匚wl果Q也是说这是一个非获取匚wQ不q行存储供以后用?
(?=pattern) 正向预查Q在M匚w pattern 的字W串开始处匚w查找字符丌Ӏ这是一个非获取匚wQ也是_该匹配不需要获取供以后使用?
(?!pattern) 负向预查Q与(?=pattern)作用相反
x|y 匚w x ?y?
[xyz] 字符集合?
[^xyz] 负值字W集合?
[a-z] 字符范围Q匹配指定范围内的Q意字W?
[^a-z] 负值字W范_匚wM不在指定范围内的L字符?
\b 匚w一个单词边界,也就是指单词和空格间的位|?br />
\B 匚w非单词边界?
\cx 匚w由x指明的控制字W?
\d 匚w一个数字字W。等价于 [0-9]?
\D 匚w一个非数字字符。等价于 [^0-9]?
\f 匚w一个换늬。等价于 \x0c ?\cL?
\n 匚w一个换行符。等价于 \x0a ?\cJ?
\r 匚w一个回车符。等价于 \x0d ?\cM?
\s 匚wMI白字符Q包括空根{制表符、换늬{等。等价于[ \f\n\r\t\v]?
\S 匚wM非空白字W。等价于 [^ \f\n\r\t\v]?
\t 匚w一个制表符。等价于 \x09 ?\cI?
\v 匚w一个垂直制表符。等价于 \x0b ?\cK?
\w 匚w包括下划U的M单词字符。等价于’[A-Za-z0-9_]’?
\W 匚wM非单词字W。等价于 ’[^A-Za-z0-9_]’?
\xn 匚w nQ其?n 为十六进制{义倹{十六进制{义值必Mؓ定的两个数字长?br />
\num 匚w numQ其中num是一个正整数。对所获取的匹配的引用?
\n 标识一个八q制转义值或一个后向引用。如?\n 之前臛_ n 个获取的子表辑ּQ则 n 为后向引用。否则,如果 n 为八q制数字 (0-7)Q则 n Z个八q制转义倹{?
\nm
标识一个八q制转义值或一个后向引用。如?\nm 之前臛_有is preceded by at least nm 个获取得子表辑ּQ则 nm
为后向引用。如?\nm 之前臛_?n 个获取,?n Z个后跟文?m 的后向引用。如果前面的条g都不满Q若 n ?m
均ؓ八进制数?(0-7)Q则 \nm 匹配八q制转义?nm?
\nml 如果 n 为八q制数字 (0-3)Q且 m ?l 均ؓ八进制数?(0-7)Q则匚w八进制{义?nml?
\un 匚w nQ其?n 是一个用四个十六q制数字表示的Unicode字符?
匚w中文字符的正则表辑ּQ?[u4e00-u9fa5]
匚w双字节字W?包括汉字在内)Q[^x00-xff]
应用Q计字W串的长度(一个双字节字符长度?QASCII字符?Q?/p>
String.prototype.len=function(){return this.replace([^x00-xff]/g,"aa").length;}
匚wI的正则表辑ּQn[s| ]*r
匚wHTML标记的正则表辑ּQ?<(.*)>.*</1>|<(.*) />/
匚w首尾I格的正则表辑ּQ?^s*)|(s*$)
应用Qjavascript中没有像vbscript那样的trim函数Q我们就可以利用q个表达式来实现Q如下:
String.prototype.trim = function()
{
return this.replace(/(^s*)|(s*$)/g, "");
}
利用正则表达式分解和转换IP地址Q?/p>
下面是利用正则表辑ּ匚wIP地址QƈIP地址转换成对应数值的JavascriptE序Q?/p>
function IP2V(ip)
{
re=/(d+).(d+).(d+).(d+)/g //匚wIP地址的正则表辑ּ
if(re.test(ip))
{
return RegExp.$1*Math.pow(255,3))+RegExp.$2*Math.pow(255,2))+RegExp.$3*255+RegExp.$4*1
}
else
{
throw new Error("Not a valid IP address!")
}
}
不过上面的程序如果不用正则表辑ּQ而直接用split函数来分解可能更单,E序如下Q?/p>
var ip="10.100.20.168"
ip=ip.split(".")
alert("IP值是Q?+(ip[0]*255*255*255+ip[1]*255*255+ip[2]*255+ip[3]*1))
匚wEmail地址的正则表辑ּQw+([-+.]w+)*@w+([-.]w+)*.w+([-.]w+)*
匚w|址URL的正则表辑ּQhttp://([w-]+.)+[w-]+(/[w- ./?%&=]*)?
利用正则表达式去除字串中重复的字W的法E序Q?/p>
var s="abacabefgeeii"
var s1=s.replace(/(.).*1/g,"$1")
var re=new RegExp("["+s1+"]","g")
var s2=s.replace(re,"")
alert(s1+s2) //l果为:abcefgi
我原来在CSDN上发贴寻求一个表辑ּ来实现去除重复字W的ҎQ最l没有找刎ͼq是我能惛_的最单的实现Ҏ。思\是用后向引用取出包括重复的字符Q再以重复的字符建立W二个表辑ּQ取C重复的字W,两者串q。这个方法对于字W顺序有要求的字W串可能不适用?/p>
得用正则表达式从URL地址中提取文件名的javascriptE序Q如下结果ؓpage1
s="http://www.9499.net/page1.htm"
s=s.replace(/(.*/){0,}([^.]+).*/ig,"$2")
alert(s)
利用正则表达式限制网表单里的文本框输入内容Q?/p>
? 正则表达式限制只能输入中文:onkeyup="value=value.replace(/[^u4E00-u9FA5]/g,'')" onbeforepaste="clipboardData.setData('text',clipboardData.getData('text').replace(/[^u4E00-u9FA5]/g,''))"
? 正则表达式限制只能输入全角字W: onkeyup="value=value.replace(/[^uFF00-uFFFF]/g,'')" onbeforepaste="clipboardData.setData('text',clipboardData.getData('text').replace(/[^uFF00-uFFFF]/g,''))"
? 正则表达式限制只能输入数字:onkeyup="value=value.replace(/[^d]/g,'') "onbeforepaste="clipboardData.setData('text',clipboardData.getData('text').replace(/[^d]/g,''))"
? 正则表达式限制只能输入数字和英文Qonkeyup="value=value.replace(/[W]/g,'') "onbeforepaste="clipboardData.setData('text',clipboardData.getData('text').replace(/[^d]/g,''))"
当词法分析器与说明文件规则部分中的一个扩展正则表辑ּ匚wӞ它执行与扩展正则表达式相对应?em>操作。没有够的规则匚w输入中的所有字W串Q词法分析器则将输入复制到标准输出。因此,不要创徏仅将输入复制到输出的规则。缺省的输出能够帮助在规则中查找间隔?/p>
当?lex 命o处理?yacc 命o产生的解析器的输入时Q请提供与所有输入字W串匚w的规则。那些规则必ȝ?yacc 命o能够解释的输出?/p>
要忽略与扩展正则表达式关联的输入Q请使用 ;QC 语言I句)作ؓ操作。下面的CZ忽略了三个间隔字W(I白、制表符和换行)Q?/p>
[ \t\n] ;
要避免反复写相同的操作,请?|Q管道符P。此字符指示此规则的操作与下一条规则的操作相同。例如,先前忽略I白、制表符和换行字W的CZ也可写成Q?/p>
" " | "\t" | "\n" ;
要确定哪个文本与说明文g的规则部分的表达式匹配,您可以包?C 语言 printf 子例E调用作表达式的一个操作。当词法分析器在输入中扑ֈ匚wQ程序将匚w字符串放入外部字W(charQ和宽字W(wchar_tQ数l中Q分别称?yytext ?yywtext。例如,您能使用下面的规则打印匹配字W串Q?/p>
[a-z]+ printf("%s",yytext);
C 语言 printf 子例E接受格式参数和要打印的数据。在此示例中Q?strong>printf 子例E的参数h下面的含义:
%s | 在打C前将数据转换为类型字W串的符?/td> |
%S | 在打C前将数据转换为宽字符Ԍwchar_tQ的W号 |
yytext | 包含要打印的数据的数l的名称 |
yywtext | 包含要打印的多字节类型(wchar_tQ数据的数组名称 |
lex 命o定义 ECHOQ作打印 yytext 的内容的Ҏ操作。例如,下面的两条规则是{h的:
[a-z]+ ECHO; [a-z]+ printf("%s",yytext);
您可以在 lex 说明文g的定义部分?%array 或?%pointer 如下更改 yytext 的说明:
%array | ?yytext 定义Z null l束的字W数l。这是缺省操作?/td> |
%pointer | ?yytext 定义为指向以 null l束的字W串的指针?/td> |
要查找词法分析器与特定的扩展正则表达式所匚w的字W数Q请使用 yyleng 或?yywleng 外部变量?/p>
yyleng | 跟踪匚w的字节数?/td> |
yywleng | 跟踪匚w字符串中的宽字符数。多字节字符的长度大?1?/td> |
要对输入的字数和字中的字W数q行计数Q请使用下面的操作:
[a-zA-Z]+ {words++;chars += yyleng;}
此操作总计匚w的字中的字符敎ͼq将该数字赋?chars?/p>
下面的表辑ּ在匹配字W串中查找最后一个字W:
yytext[yyleng-1]
lex 命o对输入流q行分区Qƈ不搜索每个表辑ּ的所有可能的匚w字符丌Ӏ每个字W仅计算一ơ。要覆盖此选项q搜索可能重叠或者互相包含的,请?REJECT 操作。例如,要对 she ?he 的所有实例(包括包含?she 中的 heQ计敎ͼ请用下面的操作Q?/p>
she {s++; REJECT;} he {h++} \n |. ;
在对 she 的出现次数进行计数后Q?strong>lex 命o拒绝输入字符Ԍ然后?he 的出现次数进行计数。因?he q不包括 sheQ所?REJECT 操作不必?he 上?/p>
典型情况下,来自输入的下一个字W串覆盖 yytext 数组中的当前V如果您使用 yymore 子例E,来自输入的下一个字W串被d?yytext 数组的当前项的尾部?/p>
%s instring %% <INITIAL>\" { /* start of string */ BEGIN instring; yymore(); } <instring>\" { /* end of string */ printf("matched %s\n", yytext); BEGIN INITIAL; } <instring>. { yymore(); } <instring>\n { printf("Error, new line in string\n"); BEGIN INITIAL; }
管通过匚w多个规则Q字W串可能被识别,但是反复调用 yymore 子例E可以确?yytext 数组包含整个字符丌Ӏ?/p>
要将字符q回l输入流Q请使用下面的调用:
yyless(n)
其中 n 是当前字W串中要保持的字W数。字W串中超q此数目的字W被q回到输入流?strong>yyless 子例E提供的先行函数cd?/Q斜杠)q算W所使用的相同,但是它允许更多对其用法的控制?/p>
不止一ơ?yyless 子例E处理文本。例如,当语法分?C 语言E序Ӟ诸如 x=-a 之类的表辑ּ很难理解。它表示 x{于-aQ还?x -= a 的旧的表qŞ式(意味着?x减去?/em>aQ?要将此表辑ּ作ؓ x{于-a 处理Q但是要打印警告消息则请使用如下的规则:
=-[a-zA-Z] { printf("Operator (=-) ambiguous\n"); yyless(yyleng-1); ... action for = ... }
lex E序允许E序使用下述输入Q输出(I/OQ子例程Q?/p>
input() | q回下一个输入字W?/td> |
output(c) | 字W?c 写到输出 |
unput(c) | 字W?c 推回输入,E后再通过 input 子例E读?/td> |
winput() | q回下一个多字节输入字符 |
woutput(C) | 多字节字符 C 写回输出?/td> |
wunput(C) | 多字节字符 C 推回输入,以通过 winput 子例E读?/td> |
lex E序提供q些子例E作为宏定义。子例程的代码在 lex.yy.c 文g中。您能覆盖它们ƈ提供其他版本?/p>
定义 winput?strong>wunput ?woutput 宏以使用 yywinput?strong>yywunput ?yywoutput 子例E。考虑到兼Ҏ,yy 子例E随后?input?strong>unput ?output 子例E来诅R写和替换完全多字节字符中需要数目的字节?/p>
q些子例E定义外部文件和内部字符之间的关pR如果您更改子例E,请以相同的方式将它们全部更改。这些子例程应该遵@q些规则Q?/p>
lex.yy.c 文g允许词法分析器最多备?200 个字W?/p>
要读包含 NULL 的文Ӟ请创Z同版本的 input 子例E。在 input 子例E的正常版本中,Q从I字W)q回的?0 表明q是文g的末,且将l止输入?/p>
lex 命o生成的词法分析器通过 input、output ?unput 子例E处理字W?I/O。因此,要在 yytext 子例E中q回|lex 命o使用q些子例E用的字符说明。但是,在内?lex 命o使用整C表每一个字W。当使用标准库时Q此整数是计机用来表示字符的位模式的倹{正常情况下Q字?a 用与字符帔R a 相同的格式表C。如果您使用不同?I/O 子例E更Ҏ解释Q请{换表攑ֈ说明文g的定义部分。{换表在包含下q条目的行开始和l束Q?/p>
%T
转换表包含指CZ每个字符兌的值的其他行。例如:
%T {integer} {character string} {integer} {character string} {integer} {character string} %T
当词法分析器到达文g末尾Ӟ它调?yywrap 库子例程Q此调用q回?1Q指C法分析器应该l箋在输入末正常结束?/p>
但是Q如果词法分析器从多个源接收到输入,h?yywrap 子例E。新的函数必获取新的输入ƈ?0 q回l词法分析器。返回?0 指示E序应该l箋处理?/p>
您也可以包含代码Q以在词法分析器在新版本?yywrap 子例E中l止Ӟ打印摘要报告和表?strong>yywrap 子例E是强制 yylex 子例E识别输入末唯一途径?/p>
java(gjc)词法分析器优点:
1.所有的源文件一ơ读入到内存~冲区buf[]中,寚w后的操作有一定的化作用,
q得词法分析速度有一定的提高?br /> 2.词法分析的出错点报告_到具体的行和列:line, col。觉得没有必要精到列?br /> 3.通过scanChar()来预M个字W,然后Ҏ预读的字W来推测该token可能的类?br /> 然后调用相应的函数来处理。抽象程度更高,值得学习?br />
When the scanner receives an end-of-file indication from YY_INPUT, it then checks the `yywrap()' function. If `yywrap()' returns false (zero), then it is assumed that the function has gone ahead and set up yyin
to point to another input file, and scanning continues. If it returns true (non-zero), then the scanner terminates, returning 0 to its caller. Note that in either case, the start condition remains unchanged; it does not revert to INITIAL
.
If you do not supply your own version of `yywrap()', then you must either use `%option noyywrap' (in which case the scanner behaves as though `yywrap()' returned 1), or you must link with `-lfl' to obtain the default version of the routine, which always returns 1.