??xml version="1.0" encoding="utf-8" standalone="yes"?> 在DFA中,某个状态对应到ε-NFA中的若干状态,应此我们会(x)得到下面q样的一个结构?/p> 可以看到Qؓ(f)?jin)调试方便我们在l构中定义了(jin)状态的唯一ID以及(qing)对应?#949;-NFA状态的集合和一个标C?/p>
关于自动机的说明Q这里不不再复述Q请?a href="http://zh.wikipedia.org/wiki/自动? target="_blank">http://zh.wikipedia.org/wiki/自动?/a>查看?/p>
首先Q我们规定表辑ּ中只允许输入Char_Type和String_Typecd的字W?/p>
对于一个状态来_(d)我们q不需要知道他的Q何信?/p>
在上面的代码中,Z(jin)调试方便Q我为其加入?jin)idx域,qؓ(f)每个状态分配了(jin)一个唯一的ID?/p>
ε-NFA中的辚w是有向的。对于Q意一条边来说Q最重要的是他的两个端点是谁以及(qing)q条Ҏ(gu)对应的字W集Q若q条边不?#949;边)(j)?/p>
有了(jin)以上两个l构之后Q我们ؓ(f)Ruled三个成员变量 pEpsilonStart和pEpsilonEnd分别表示q条表达式所对应状态机的start和end状态,epsilonNFA_Edges则是以某个状态作为keyQ从q个状态到辑֏一个状态所l过的边作ؓ(f)value的hash表?/p>
l于C(jin)正题Q首先,我们把所有的关系分ؓ(f)串联关系、ƈ联关pR可选关pR?ơ及(qing)以上的重复关pR至?ơ以上的重复关系和取反关pR下面分别介l每U关pȝε-NFA如何来生成。(下文中若没有指出q接边的cd则是ε?/strong>Q?/p>
正如上文所_(d)字符集包括Char_Type和String_Typecd的字W,应此生成字符集的状态机比较简单了(jin)Q只需创徏Z个状态,然后通过一条经q这个字W集的边这两个状态按照某个方向连接,最后将一个状态标Cؓ(f)start状态,另一个状态标Cؓ(f)end状态即可?/p>
串联关系中,只需要将前一个状态机的end状态通过ε边连接到另一个状态机的start状态,最后将前一个状态的start状态标Cؓ(f)新状态机的start状态,后一个状态机的end状态标Cؓ(f)新状态机的end状态即可?/p>
q联关系中,需要分别新Z个start和end状态做为新状态机的start和end状态,然后新生成的start状态分别连接到原状态机的start状态,原状态机的end状态连接到新生成的end状态即可?/p>
在正则表辑ּ中存在Ş?a)+?a)*的两U重复关p,对于q两U重复关p,生成的状态机的区别仅在于end状态对于一ơ以上的重复Q只需要给原状态机d一条从end状态到start状态的ε边即可。而对于零ơ以上的重复Q不光需要添?#949;边,同时需要将新状态机的end状态标Cؓ(f)start状态,因ؓ(f)零次重复时不需要经qQ意边既可被接受?/p>
上面q三U是最基本的生成方法,通过以上q三U生成方法已_应付多数的表辑ּ?/p>
下面来看看一些拓展Ş式的状态机是如何生成的?/p>
在可选关pMQ只需要给原状态机d一条从start状态到end状态的ε边即可。由?#949;-NFA只允许有一个start和end状态的关系Q应此当条g不成立时从start状态就可直接通过ε边到达end状态?/p>
׃我们只处理Char_Type和String_Typecd的字W集Q应此对于取反我们只需要将当前状态机内所有类型ؓ(f)TChar或TStringcd的边取一下反卛_Q?strong>需要注意的是可能存在负负得正的情况Q既取偶数次反等于没取反 所谓的char-char关系是正则表达式中的[a-z]表达式。其实这是一Uƈ联关pȝ拓展Q由两个原始状态机拓展C(jin)N个,生成Ҏ(gu)也类伹{?/p>
让我们来~写一个函数来打印出整个生成后的状态机?/p>
最后我们来~写一些测试代码来试试生成效果如何 可打印出如下内容 最后Ş成Ş如下囄状态机 完整的代码可?a target="_blank">http://code.google.com/p/qlanguage下蝲?/p>
说到ParserGenerator不得不提的是BNFQ应此QParserGenerator也有它自qBNFQ这时有Z(x)问BNFI竟是什么呢Q简单的说BNF是用来描述一U语法的东西Q比如在Basic中If后面跟表辑ּ然后是Then中间是语句块末尾必须要有End If{等的一pd描述Q更专业的解释我们可以看一?a target="_blank">l基癄上的解释?br />
好了(jin)Q说完了(jin)BNF那让我们来看一下QParserGenerator的BNF到底是长啥样?br />
%start start;
strings -> strings "{String}"
| "{String}"
;
vs -> vs "{Letter}"
| vs "{String}"
| "{Letter}"
| "{String}"
;
option -> "[" vs "]"
;
oneProductionRight -> oneProductionRight option
| oneProductionRight vs
| option
| vs
;
someProductionRight -> someProductionRight "|" oneProductionRight
| oneProductionRight
;
token -> "%" "token" strings ";"
;
someTokens -> someTokens token
| token
;
production -> "{Letter}" "-" ">" someProductionRight ";"
;
someProductions -> someProductions production
| production
;
start -> someTokens "%" "start" "{Letter}" ";" someProductions
| "%" "start" "{Letter}" ";" someProductions
;
首先可以看到最上有一些以%token开头的字符Ԍ在C语言中我们将用双引号括v来的字符序列UCؓ(f)字符Ԍ(j)以及(qing)最后的一个分P其实q里的这些字W串正是BNF中说说的l结W,所以我们规定,所有其他没?token声明的符号都是非l结W。终l符是用来做U进操作的,在某U特定的语言中他表现Z个tokenQ而非l结W可以理解ؓ(f)一个代词,通常一个非l结W都可以展开Z条或多条规则Q?a target="_blank">产生?/a>Q。至于说Z么每条内容后面都?x)有分号呢,只是Z(jin)处理上的方便Q消除语法上的冲H?Q?br />
好了(jin)Q我们把l结W和非终l符q两个专业术语给解释完了(jin)Q接下来可以看到的是一个以%start开头后跟一个非l结W的语句Q他表明?jin)所有规则(产生?/a>Q是从哪里开始的Q有始无l的节奏-_-||杯具啊)(j)?br />
最后就是我们的重头?jin),多空一行也不ؓ(f)q吧。这里有一大堆的生式Q那我们如何来阅M呢,其实上面已经介绍?jin)有个表明?jin)所有规则开头的非终l符Q好那让我们来找一下他所对应的生式在哪?br />
| "%" "start" "{Letter}" ";" someProductions
;
下面让我们来看一下预定义的终l符有哪些,?a target="_blank">Parser.cpp的代码中可知预定义的l结W有"{String}"?{Digit}"?{Real}"?{Letter}"?br />
"{String}"Q表C正则表辑ּ\"[^\"]*\"
"{Digit}"Q表C正则表辑ּ[0-9]+
"{Real}"Q表C正则表辑ּ[0-9]*.[0-9]+
"{Letter}"Q表C正则表辑ּ((_[0-9]+)|([_a-zA-Z]+))[_0-9a-zA-Z]*
从这些正则表辑ּ中可?{String}"表示一个带双引L(fng)字符Ԍ"{Digit}"则表CZ个数字,"{Real}"则表CZ个QҎ(gu)Q?{Letter}"则表CZ个不带双引号的字W串。当然这些正则表辑ּ写的q不完备Q比?{String}"中没有支持{义等{?br />
然后让我们来看一下每条规则支持哪些语法,首先从下面几条文法中可知Q可用方括号一些可选项括v来?br />
2 | vs "{String}"
3 | "{Letter}"
4 | "{String}"
5 ;
6
7 option -> "[" vs "]"
8 ;
而对于一个规则来说他可以用若q条产生式来说明他,其中每条产生式之间是?/font>的关pR?br />
2 | oneProductionRight vs
3 | option
4 | vs
5 ;
6
7 someProductionRight -> someProductionRight "|" oneProductionRight
8 | oneProductionRight
9 ;
其他一些规则则说明?jin)一些上文提到的规则Q比如开头是一些token的定义等。终于把QParserGenerator的文法文件的l构l介l完?jin),在接下来的一文章中我们介l如何用QParserGenerator来生成一个带括号优先U的四则混合q算计算器,其文法可?a target="_blank">Calculator.txtQQLanguage整个目的代码可?a target="_blank">https://github.com/lwch/QLanguage/?img src ="http://www.shnenglu.com/lwch/aggbug/203576.html" width = "1" height = "1" />
]]>
应此Q相应的make函数变成?br />
{
vector<LALR1Production> v;
v.push_back(inputProductions[begin][0]);
pStart = closure(v);
pStart->idx = Item::inc();
context.states.insert(pStart);
items.push_back(pStart);
queue<Item*> q;
q.push(pStart);
vector<Item*> changes;
bool bContinue = false;
while (!q.empty())
{
Item* pItem = q.front();
vector<Production::Item> s;
symbols(pItem, s);
select_into(s, vts, compare_production_item_is_vt, push_back_unique_vector<Production::Item>);
select_into(s, vns, compare_production_item_is_vn, push_back_unique_vector<Production::Item>);
for (vector<Production::Item>::const_iterator i = s.begin(), m = s.end(); i != m; ++i)
{
Item* pNewItem = NULL;
if (go(pItem, *i, pNewItem))
{
long n = itemIndex(pNewItem);
if (n == -1)
{
pNewItem->idx = Item::inc();
q.push(pNewItem);
items.push_back(pNewItem);
context.states.insert(pNewItem);
}
else
{
items[n]->mergeWildCards(pNewItem, bContinue);
changes.push_back_unique(items[n]);
destruct(pNewItem, has_destruct(*pNewItem));
Item_Alloc::deallocate(pNewItem);
}
edges[pItem].push_back_unique(Edge(pItem, n == -1 ? pNewItem : items[n], *i));
}
}
q.pop();
}
while (bContinue)
{
vector<Item*> v;
v.reserve(changes.size());
bContinue = false;
for (vector<Item*>::const_iterator i = changes.begin(), m = changes.end(); i != m; ++i)
{
vector<Production::Item> s;
symbols(*i, s);
for (vector<Production::Item>::const_iterator j = s.begin(), n = s.end(); j != n; ++j)
{
Item* pNewItem = NULL;
if (go(*i, *j, pNewItem))
{
long n = itemIndex(pNewItem);
if (n == -1) throw error<const char*>("unknown item", __FILE__, __LINE__);
else
{
items[n]->mergeWildCards(pNewItem, bContinue);
v.push_back_unique(items[n]);
destruct(pNewItem, has_destruct(*pNewItem));
Item_Alloc::deallocate(pNewItem);
}
}
}
}
changes = v;
}
}
一个示?/strong>
下面我们用一个例子来说明LALR1 DFA是如何生成的Q首先它的文法如?br />
| R "+"
| R
;
L -> "*" R
| "id"
;
R -> L
;
首先我们写出q个文法的增q文?br />
wildCards:
#
S -> . L "=" R
wildCards:
#
S -> . R "+"
wildCards:
#
S -> . R
wildCards:
#
L -> . "*" R
wildCards:
"=" "+"
L -> . "id"
wildCards:
"=" "+"
R -> . L
wildCards:
"+" #
首先用符号S求出新状?br />
wildCards:
#
接下来用W号L求出新状?br />
wildCards:
#
R -> L
wildCards:
"+" #
然后用符号R求出新状?br />
wildCards:
#
S -> R
wildCards:
#
然后用符?求出新的状?br />
wildCards:
"=" "+"
R -> . L
wildCards:
"+" # "="
L -> . "*" R
wildCards:
"=" "+" #
L -> . "id"
wildCards:
"=" "+" #
然后是符号id?br />
wildCards:
"=" "+"
q样Q从start状态{Ud来的5条边q成好?jin),下面来看看?个新生成的状态又?x)生成一些什么呢
wildCards:
#
wildCards:
#
R -> L
wildCards:
"+" #
wildCards:
#
R -> . L
wildCards:
"+" # "="
L -> . "*" R
wildCards:
"=" "+" #
L -> . "id"
wildCards:
"=" "+" #
wildCards:
#
S -> R
wildCards:
#
wildCards:
#
wildCards:
"=" "+"
R -> . L
wildCards:
"+" # "="
L -> . "*" R
wildCards:
"=" "+" #
L -> . "id"
wildCards:
"=" "+" #
1.通过W号R转移到新状?br />
wildCards:
"=" "+"
2.通过W号L转移到新状?br />
wildCards:
"+" # "="
3.通过*则可转移到它自己
4.通过id转移到第5个状?br />
W五个状态则没有M的{UR?br />
wildCards:
#
R -> . L
wildCards:
"+" # "="
L -> . "*" R
wildCards:
"=" "+" #
L -> . "id"
wildCards:
"=" "+" #
1.通过W号R可{Ud新状?br />
wildCards:
#
2.通过W号L可{Ud状?
3.通过W号*可{Ud状?
4.通过W号id可{Ud状?
W???个状态都没有M转移
然后让我们来看下changes列表里有哪些东西Q根?a href="http://www.shnenglu.com/lwch/archive/2013/05/12/200203.html" target="_blank">上一?/a>的算法可知,所有已存在的状态都在changes列表里,应此它里面应该会(x)???三个状态?br />
x(chng)Q整个自生的部分完成?jin),下面我们其L一张图
下面是传播部?/strong>
在第一ơ传播时changes列表里有3个状态,分别对这3个状态用go函数求出新的展望W,q把它们合ƈ到原有的状态上?br />
首先看状?Q它?个状态{UȝQ分别是R、L?和id
1.通过W号R可{Ud状?Q同时它的展望符如下
wildCards:
"=" "+" #
2.通过W号L可{Ud状?Q同时它的展望符如下
wildCards:
"+" # "="
3.通过W号*可{Ud它自己,同时它的展望W如?br />
wildCards:
"=" "+" #
R -> . L
wildCards:
"+" # "="
L -> . "*" R
wildCards:
"=" "+" #
L -> . "id"
wildCards:
"=" "+" #
4.通过W号id可{Ud状?Q同时它的展望符如下
wildCards:
"=" "+" #
然后我们来看一下状??Q它们没有Q何状态{UȝQ应此它们不?x)传播Q何展望符?br />
现在changes列表里有4个状态,分别????Q又׃W?个状态已l生了(jin)新的展望W?应此需要(h)l传?br />
W二ơ传?/strong>
首先先看状??Q它们没有Q何状态{UȝQ应此它们不?x)传播Q何展望符?br />
然后来看状?Q同L(fng)它有4个状态{UȝQ分别ؓ(f)R、L?和id?br />
1.通过W号R可{Ud状?Q同时它的展望符如下
wildCards:
"=" "+" #
2.通过W号L可{Ud状?Q同时它的展望符如下
wildCards:
"+" # "="
3.通过W号*可{Ud它自己,同时它的展望W如?br />
wildCards:
"=" "+" #
R -> . L
wildCards:
"+" # "="
L -> . "*" R
wildCards:
"=" "+" #
L -> . "id"
wildCards:
"=" "+" #
4.通过W号id可{Ud状?Q同时它的展望符如下
wildCards:
"=" "+" #
最后我们来看状?Q它没有M状态{UȝQ应此它不会(x)传播M展望W?br />
现在changes列表里同h4个状态,分别????Q由于没有一个状态生了(jin)新的展望W,应此它将不会(x)l箋(hu)传播下去?jin)?br />
现在整个文法的DFAq成完毕了(jin)Q让我们来修改一下原先的那张图来看看最l的DFA是什么样的?br />
整个CZ先介绍到这里,在接下来的一文章中会(x)通过几个CZ来介lclosure和go函数的原理,希望q种q到细的讲解顺序能够被读者所接受。最后完整的代码可到http://code.google.com/p/qlanguage下蝲?img src ="http://www.shnenglu.com/lwch/aggbug/200608.html" width = "1" height = "1" />
]]>
]]>DFA的状?/h1>
struct DFA_State
{
set<EpsilonNFA_State*> content;
bool bFlag;
#ifdef _DEBUG
uint idx;
#endif
DFA_State(const set<EpsilonNFA_State*>& x) : content(x), bFlag(false)
{
#ifdef _DEBUG
idx = inc();
#endif
}
inline const bool operator==(const DFA_State& x)const
{
if (&x == this) return true;
return content == x.content;
}
#ifdef _DEBUG
inline uint inc()
{
static uint i = 0;
return i++;
}
#endif
};
DFA的边
]]>表达?/h1>
template <typename Char_Type, typename String_Type>
class Rule
{
};
ε-NFA的状?/h1>
struct EpsilonNFA_State
{
#ifdef _DEBUG
uint idx;
EpsilonNFA_State()
{
idx = inc();
}
static uint inc()
{
static uint i = 0;
return i++;
}
#else
EpsilonNFA_State() {}
#endif
};
ε-NFA的边
struct EpsilonNFA_Edge
{
struct
{
Char_Type char_value;
String_Type string_value;
}data;
enum Edge_Type
{
TUnknown = 0,
TNot = 1,
TChar = 2,
TString = 4,
TEpsilon = 8
};
uchar edge_type;
EpsilonNFA_State* pFrom;
EpsilonNFA_State* pTo;
EpsilonNFA_Edge(EpsilonNFA_State* pFrom, EpsilonNFA_State* pTo) : edge_type(TEpsilon), pFrom(pFrom), pTo(pTo) {}
EpsilonNFA_Edge(const Char_Type& x, EpsilonNFA_State* pFrom, EpsilonNFA_State* pTo) : edge_type(TChar), pFrom(pFrom), pTo(pTo)
{
data.char_value = x;
}
EpsilonNFA_Edge(const String_Type& x, EpsilonNFA_State* pFrom, EpsilonNFA_State* pTo) : edge_type(TString), pFrom(pFrom), pTo(pTo)
{
data.string_value = x;
}
inline void negate()
{
edge_type ^= TNot;
}
inline const bool isNot()const
{
return (edge_type & TNot) == TNot;
}
inline const bool isEpsilon()const
{
return (edge_type & TEpsilon) == TEpsilon;
}
inline const bool isChar()const
{
return (edge_type & TChar) == TChar;
}
inline const bool isString()const
{
return (edge_type & TString) == TString;
}
const Edge_Type edgeType()const
{
if (isEpsilon()) return TEpsilon;
else if (isChar()) return TChar;
else if (isString()) return TString;
else return TUnknown;
}
};
EpsilonNFA_State *pEpsilonStart, *pEpsilonEnd;
hashmap<EpsilonNFA_State*, vector<EpsilonNFA_Edge>, _hash> epsilonNFA_Edges;
生成状态机
字符?/h2>
Rule(const Char_Type& x, Context& context) : pDFAStart(NULL), context(context)
{
pEpsilonStart = EpsilonNFA_State_Alloc::allocate();
construct(pEpsilonStart);
pEpsilonEnd = EpsilonNFA_State_Alloc::allocate();
construct(pEpsilonEnd);
epsilonNFA_Edges[pEpsilonStart].push_back(EpsilonNFA_Edge(x, pEpsilonStart, pEpsilonEnd));
context.epsilonNFA_States.insert(pEpsilonStart);
context.epsilonNFA_States.insert(pEpsilonEnd);
}
Rule(const String_Type& x, Context& context) : pDFAStart(NULL), context(context)
{
pEpsilonStart = EpsilonNFA_State_Alloc::allocate();
construct(pEpsilonStart);
pEpsilonEnd = EpsilonNFA_State_Alloc::allocate();
construct(pEpsilonEnd);
epsilonNFA_Edges[pEpsilonStart].push_back(EpsilonNFA_Edge(x, pEpsilonStart, pEpsilonEnd));
context.epsilonNFA_States.insert(pEpsilonStart);
context.epsilonNFA_States.insert(pEpsilonEnd);
}
串联关系
self operator+(const self& x)
{
self a = cloneEpsilonNFA(*this), b = cloneEpsilonNFA(x);
copyEpsilonNFA_Edges(b, a);
a.epsilonNFA_Edges[a.pEpsilonEnd].push_back(EpsilonNFA_Edge(a.pEpsilonEnd, b.pEpsilonStart));
a.pEpsilonEnd = b.pEpsilonEnd;
return a;
}
q联关系
self operator|(const self& x)
{
self a = cloneEpsilonNFA(*this), b = cloneEpsilonNFA(x);
copyEpsilonNFA_Edges(b, a);
EpsilonNFA_State* _pStart = EpsilonNFA_State_Alloc::allocate();
construct(_pStart);
EpsilonNFA_State* _pEnd = EpsilonNFA_State_Alloc::allocate();
construct(_pEnd);
context.epsilonNFA_States.insert(_pStart);
context.epsilonNFA_States.insert(_pEnd);
a.epsilonNFA_Edges[_pStart].push_back(EpsilonNFA_Edge(_pStart, a.pEpsilonStart));
a.epsilonNFA_Edges[_pStart].push_back(EpsilonNFA_Edge(_pStart, b.pEpsilonStart));
a.epsilonNFA_Edges[a.pEpsilonEnd].push_back(EpsilonNFA_Edge(a.pEpsilonEnd, _pEnd));
a.epsilonNFA_Edges[b.pEpsilonEnd].push_back(EpsilonNFA_Edge(b.pEpsilonEnd, _pEnd));
a.pEpsilonStart = _pStart;
a.pEpsilonEnd = _pEnd;
return a;
}
重复关系
self operator*()
{
self a = cloneEpsilonNFA(*this);
a.epsilonNFA_Edges.insert(EpsilonNFA_Edge(a.pEpsilonEnd, a.pEpsilonStart));
a.pEpsilonEnd = a.pEpsilonStart;
return a;
}
self operator+()
{
self a = cloneEpsilonNFA(*this);
a.epsilonNFA_Edges[a.pEpsilonEnd].push_back(EpsilonNFA_Edge(a.pEpsilonEnd, a.pEpsilonStart));
return a;
}
一些拓?/h1>
可选关p?/h2>
inline self opt()
{
self a = cloneEpsilonNFA(*this);
a.epsilonNFA_Edges[a.pEpsilonStart].push_back(EpsilonNFA_Edge(a.pEpsilonStart, a.pEpsilonEnd));
return a;
}
取反关系
self operator!()
{
self a = cloneEpsilonNFA(*this);
for (typename hashmap<EpsilonNFA_State*, vector<EpsilonNFA_Edge>, _hash>::iterator i = a.epsilonNFA_Edges.begin(), m = a.epsilonNFA_Edges.end(); i != m; ++i)
{
for (typename vector<EpsilonNFA_Edge>::iterator j = i->second.begin(), n = i->second.end(); j != n; ++j)
{
if (j->isChar() || j->isString()) j->negate();
}
}
return a;
}
Char-Char关系
self operator-(const self& x)
{
self a = cloneEpsilonNFA(*this);
if (epsilonNFA_Edges.size() == 1 && x.epsilonNFA_Edges.size() == 1 &&
epsilonNFA_Edges.begin()->second.size() == 1 && x.epsilonNFA_Edges.begin()->second.size() == 1 &&
epsilonNFA_Edges.begin()->second.begin()->edge_type == EpsilonNFA_Edge::TChar && x.epsilonNFA_Edges.begin()->second.begin()->edge_type == EpsilonNFA_Edge::TChar)
{
EpsilonNFA_State* _pStart = EpsilonNFA_State_Alloc::allocate();
construct(_pStart);
EpsilonNFA_State* _pEnd = EpsilonNFA_State_Alloc::allocate();
construct(_pEnd);
context.epsilonNFA_States.insert(_pStart);
context.epsilonNFA_States.insert(_pEnd);
a.epsilonNFA_Edges[_pStart].push_back(EpsilonNFA_Edge(_pStart, a.pEpsilonStart));
a.epsilonNFA_Edges[a.pEpsilonEnd].push_back(EpsilonNFA_Edge(a.pEpsilonEnd, _pEnd));
const Char_Type chStart = epsilonNFA_Edges.begin()->second.begin()->data.char_value;
const Char_Type chEnd = x.epsilonNFA_Edges.begin()->second.begin()->data.char_value;
for (Char_Type ch = chStart + 1; ch < chEnd; ++ch)
{
self y(ch, context);
copyEpsilonNFA_Edges(y, a);
a.epsilonNFA_Edges[_pStart].push_back(EpsilonNFA_Edge(_pStart, y.pEpsilonStart));
a.epsilonNFA_Edges[y.pEpsilonEnd].push_back(EpsilonNFA_Edge(y.pEpsilonEnd, _pEnd));
}
self b = cloneEpsilonNFA(x);
copyEpsilonNFA_Edges(b, a);
a.epsilonNFA_Edges[_pStart].push_back(EpsilonNFA_Edge(_pStart, b.pEpsilonStart));
a.epsilonNFA_Edges[b.pEpsilonEnd].push_back(EpsilonNFA_Edge(b.pEpsilonEnd, _pEnd));
a.pEpsilonStart = _pStart;
a.pEpsilonEnd = _pEnd;
}
else
{
throw error<string>("doesn't support", __FILE__, __LINE__);
}
return a;
}
֣
void printEpsilonNFA()
{
printf("-------- ε- NFA Start --------\n");
for (typename hashmap<EpsilonNFA_State*, vector<EpsilonNFA_Edge>, _hash>::const_iterator i = epsilonNFA_Edges.begin(), m = epsilonNFA_Edges.end(); i != m; ++i)
{
for (typename vector<EpsilonNFA_Edge>::const_iterator j = i->second.begin(), n = i->second.end(); j != n; ++j)
{
printf("%03d -> %03d", j->pFrom->idx, j->pTo->idx);
switch (j->edgeType())
{
case EpsilonNFA_Edge::TEpsilon:
printf("(ε)");
break;
case EpsilonNFA_Edge::TChar:
printf("(%c)", j->data.char_value);
break;
case EpsilonNFA_Edge::TString:
printf("(%s)", j->data.string_value.c_str());
break;
default:
break;
}
if (j->isNot()) printf("(not)");
printf("\n");
}
}
printf("start: %03d -> end: %03d\n", pEpsilonStart->idx, pEpsilonEnd->idx);
printf("--------- ε- NFA End ---------\n");
}
Rule_Type::Context context;
Rule_Type a('a', context), b('b', context), d('d', context);
Rule_Type result = (a - d).opt() + (+b | !(a + b));
#ifdef _DEBUG
result.printEpsilonNFA();
#endif
2 {
3 I temp = input;
4 if(O Result = left.Parser(input)) return Result;
5 input = temp;
6 if(O Result = right.Parser(input)) return Result;
7 input = temp;
8 O Result(GetMM());
9 return Result;
10 }
应此我ؓ(f)CParser_Input增加?jin)两个成员变量保存此时的SymbolStack和StringStack的Size,当SymbolStack和StringStack Push的时候同时增加input相应的?
最后重载CParser_Input的operator=赋值操作符,在其中根据原先的SymbolStack和StringStack的Size来弹出相应数量的重复?br />
2 {
3 LexerTokenList = _value.LexerTokenList;
4 index = _value.index;
5
6 if(_value.symbolCount < symbolCount && _value.symbolCount)
7 {
8 int Count = symbolCount - _value.symbolCount;
9 for(int i=0;i<Count;i++) SymbolStack.Pop();
10 }
11
12 if(_value.stringCount < stringCount && _value.stringCount)
13 {
14 int Count = stringCount - _value.stringCount;
15 for(int i=0;i<Count;i++) StringStack.Pop();
16 }
17
18 symbolCount = SymbolStack.Size();
19 stringCount = StringStack.Size();
20 return *this;
21 }
]]>
1.AST的每个节点由2个域l成,q?个域分别表示当前节点的类型和附加信息?br />2.AST的每个节点包含一个指向其子节点的序表?br />3.AST的每个节点包含指向下一个节点的指针?br />lg所q我们得到AST节点的代码:(x)
2 {
3 public:
4 CSyntaxTreeNode(int _type,int _value) : type(_type),value(_value){}
5
6 inline List<NAutoPtr<CSyntaxTreeNode>>& Child()
7 {
8 return child;
9 }
10
11 inline NAutoPtr<CSyntaxTreeNode> Next()
12 {
13 return next;
14 }
15
16 inline int& Type()
17 {
18 return type;
19 }
20
21 inline int& Value()
22 {
23 return value;
24 }
25 protected:
26 int type;
27 int value;
28 List<NAutoPtr<CSyntaxTreeNode>> child;
29 NAutoPtr<CSyntaxTreeNode> next;
30 };
2 enum TYPE
3 {
4 stNull,
5 stDeclare,
6 stFunction,
7 stParamterList,
8 stIf,
9 stDo,
10 stExp,
11 };
2 {
3 public:
4 inline void Push(NAutoPtr<CSyntaxTreeNode>& Node)
5 {
6 SyntaxTreeStack.Push(Node);
7 }
8
9 inline NAutoPtr<CSyntaxTreeNode> Pop()
10 {
11 return SyntaxTreeStack.Pop();
12 }
13
14 inline NAutoPtr<CSyntaxTreeNode> Top()
15 {
16 return SyntaxTreeStack.Top();
17 }
18
19 inline NAutoPtr<CSyntaxTreeNode> Root()
20 {
21 return SyntaxTreeRoot;
22 }
23 protected:
24 NAutoPtr<CSyntaxTreeNode> SyntaxTreeRoot; // 语法?wi)根节?/span>
25 Stack<NAutoPtr<CSyntaxTreeNode>> SyntaxTreeStack; // 语法?wi)?/span>
26 };
q里我们单的分析一下分析过E:(x)
以if语句ZQ其l合子代码ؓ(f)Q?br />
2 (str_then + stmt_list)[if_desc_second] +
3 Parser_Combinator_Node::opt((str_else + stmt_list)[if_desc_third]) +
4 (str_end + str_if)[if_desc_fourth];
2 declare b as integer
3 end if
分析q程如下图:(x)
1.
2.
3.
4.
5.
6.
7.
]]>
2 end class
2.cM可以包含(声明Q函敎ͼ新类。其中除?jin)class都含有public、private、protected和static属?
2 [public] declare a as string // 声明
3
4 [private] [static] function main() // 函数
5 end function
6
7 class b // 新类
8 end class
9 end class
2 end function
2 stmt_list
3 [else stmt_list]
4 end if
2 stmt_list
3 while experience end
2 stmt_list
3 end while
2 stmt_list
3 next
2 case experience:
3 [stmt_list]
4 [case experience:
5 [stmt_list]]
6 [default:
7 [stmt_list]]
8 end switch
2 赋D?br /> 3 symbol
4 string
5 number
6 true
7 false
8 (+|-)experience
9 not experience
10 experience (&|||^|%) experience
11 experience (>|<|>=|<=|==|!=) experience
12 experience (+|-|*|/) experience
13 ++symbol
14 --symbol
15 symbol++
16 symbol--
l合子代?br />
2 item = declare_desc |
3 class_desc |
4 function_desc;
5 property_desc = str_public |
6 str_private |
7 str_protected;
8 declare_type = str_integer |
9 str_string |
10 str_bool |
11 str_real |
12 type_symbol;
13 paramter_desc_list = (type_symbol + str_as + declare_type) +
14 *(str_comma + type_symbol + str_as + declare_type);
15 paramter_value_list = exp_desc + *(str_comma + exp_desc);
16 declare_desc = str_declare + type_symbol + str_as + declare_type +
17 *(str_comma + type_symbol + str_as + declare_type);
18 class_desc = str_class + type_symbol +
19 Parser_Combinator_Node::opt(str_inherit + type_symbol +
20 *(str_comma + (type_symbol & Parser_Combinator_Node::not(str_class | str_function | property_desc | str_static)))
21 ) + *class_content_desc + str_end + str_class;
22 class_content_desc = (Parser_Combinator_Node::opt(property_desc) + Parser_Combinator_Node::opt(str_static) +
23 (declare_desc | function_desc)) |
24 class_desc;
25 function_desc = (str_function + type_symbol) +
26 (str_leftbracket + Parser_Combinator_Node::opt(paramter_desc_list) + str_rightbracket) +
27 Parser_Combinator_Node::opt(str_as + declare_type) +
28 stmt_list +
29 (str_end + str_function);
30 stmt_list = *(stmt & Parser_Combinator_Node::not(str_end));
31 stmt = declare_desc |
32 if_desc |
33 do_desc |
34 while_desc |
35 for_desc |
36 switch_desc |
37 exp_desc;
38 if_desc = (str_if + exp_desc) +
39 (str_then + stmt_list) +
40 Parser_Combinator_Node::opt(str_else + stmt_list) +
41 (str_end + str_if);
42 do_desc = (str_do + stmt_list) +
43 (str_while + exp_desc + str_end);
44 while_desc = str_while + exp_desc + str_do + stmt_list + str_end + str_while;
45 for_desc = str_for + stmt_list + str_to + exp_desc + str_by + stmt_list + str_do + stmt_list + str_next;
46 switch_desc = str_switch + exp_desc + str_do + case_list + str_end + str_switch;
47 case_list = *case_desc;
48 case_desc = (str_case + exp_desc + str_colon + stmt_list) |
49 (str_default + str_colon + stmt_list);
50 assign_desc = type_symbol + str_equal + exp_desc;
51 call_desc = type_symbol + str_leftbracket + Parser_Combinator_Node::opt(paramter_value_list) + str_rightbracket;
52 logic_desc = (str_not + compare_desc) |
53 (compare_desc + *((str_operator_and | str_operator_or | str_xor | str_mod) + compare_desc));
54 compare_desc = term_desc + *((str_bigger | str_smaller |
55 str_bigger_equal | str_smaller_equal |
56 str_equal_equal | str_not_equal) + term_desc);
57 term_desc = factor_desc + *((str_add | str_sub) + factor_desc);
58 factor_desc = self_desc + *((str_mul | str_div) + self_desc);
59 self_desc = (str_add_add + type_symbol) |
60 (str_sub_sub + type_symbol) |
61 (type_symbol + str_add_add) |
62 (type_symbol + str_sub_sub) |
63 value_desc;
64 value_desc = call_desc |
65 assign_desc |
66 type_symbol |
67 type_string |
68 type_number |
69 str_true |
70 str_false |
71 ((str_add | str_sub) + logic_desc) |
72 (str_leftbracket + logic_desc + str_rightbracket);
73 exp_desc = logic_desc;
如有M补充会(x)在此文档更新?br />
]]>
修正?jin)一些运行时的Bug.
支持?jin)单行注?/和多行注?* */
支持?jin)所有的函数声明形式.
ESEngine_Demo5.rar
1.Samples文g夹下有几个例?br>2.函数目前只写?br> "function" "{Symbol}" "{LQ}" "{RQ}" stmt_list "end" "function"
"function" "{Symbol}" "{LQ}" "{RQ}" "end" "function"
"function" "{Symbol}" "{LQ}" paramter_list "{RQ}" "as" var_type stmt_list "end" "function"
q三c?所以对?br> function mn(integer s)
stmts
end function
在生成语法树(wi)时会(x)产生错误
ESEngine_Demo1_0.rar