青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

不會飛的鳥

2010年12月10日 ... 不鳥他們!!! 我要用自己開發的分布式文件系統、分布式調度系統、分布式檢索系統, 做自己的搜索引擎!!!大魚有大志!!! ---楊書童

一款簡單的正則表達式處理模塊([轉]Fast-regular-expressions)

原文出處:http://www.codeproject.com/Articles/798/Fast-regular-expressions

Fast regular expressions

By , 29 Oct 2000
 

Sample Image - RexSearch.jpg

Introduction

Regular expressions are a well recognized way for describing string patterns. The following regular expression defines a floating point number with a (possibly empty) integer part, a non empty fractional part and an optional exponent:

Collapse | Copy Code
[0-9]* \.[0-9]+ ([Ee](\+|-)?[0-9]+)?

The rules for interpreting and constructing such regular expressions are explained below. A regular expression parser takes a regular expression and a source string as arguments and returns the source position of the first match. Regular expression parsers either interpret the search pattern at runtime or they compile the regular expression into an efficient internal form (known as deterministic finite automaton). The regular expression parser described here belongs to the second category. Besides being quite fast, it also supports dictionaries of regular expressions. With the definitions $Int= [0-9], $Frac= \.[0-9]+ and $Exp= ([Ee](\+|-)?[0-9]+), the above regular expression for a floating point number can be abbreviated to $Int* $Frac $Exp?.

Interface

I separated algorithmic from interface issues. The files RexAlgorithm.h and RexAlgorithm.cpp implement the regular expression parser using only standard C++ (relying on STL), whereas the file RexInterface.h and RexInterface.cpp contain the interfaces for the end user. Currently there is only one interface, implemented in the class REXI_Search. Interfaces for replace functionality and for programming language scanners are planned for future releases.

Collapse | Copy Code
struct REXI_DefErr{
    enum{eNoErr,eErrInName,eErrInRegExp} eErrCode;
    string  strErrMsg;
    int     nErrOffset;
    };
    class REXI_Search : public REXI_Base
    {
    public:
    REXI_Search(char cEos='\0');
    REXI_DefErr
    AddRegDef   (string strName,string strRegExp);
    inline  REXI_DefErr
    SetRegexp  (string strRegExp);
    bool    MatchHere   (const char*& rpcszSrc, int& nMatchLen,bool& bEos);
    bool    Find        (const char*& rpcszSrc, int& nMatchLen,bool& bEos);
    private:
    bool    MatchHereImpl();
    int     m_nIdAnswer;
    };

Example usage

Collapse | Copy Code
int main(int argc, char* argv[])
    {
    const char szTestSrc[]= "3.1415 is the same as 31415e-4";
    const int ncOk= REXI_DefErr::eNoErr;
    REXI_Search rexs;
    REXI_DefErr err;
    err= rexs.AddRegDef("$Int","[0-9]+");  assert(err.eErrCode==ncOk);
    err= rexs.AddRegDef("$Frac","\\.[0-9]+"); assert(err.eErrCode==ncOk);
    err= rexs.AddRegDef("$Exp","([Ee](\\+|-)?[0-9]+)");
    assert(err.eErrCode==ncOk);
    err= rexs.SetRegexp("($Int? $Frac $Exp?|$Int \\. $Exp?|$Int $Exp)[fFlL]?");
    assert(err.eErrCode==ncOk);
    const char*     pCur= szTestSrc;
    int             nMatchLen;
    bool            bEosFound= false;
    cout    <<  "Source text is: \""    <<  szTestSrc   << "\"" <<  endl;
    while(rexs.Find(pCur,nMatchLen,bEosFound)){
    cout <<  "Floating point number found  at position "
    <<  ((pCur-szTestSrc)-nMatchLen)
    <<  " having length "  <<  nMatchLen  <<  endl;
    }
    int i;
    cin >> i;
    return 0;
    }

Performance issues

A call to the member function REXI_Search::SetRegexp(strRegExp)involves quite a lot of computing. The regular expression strRegExp is analyzed and after several steps transformed into a compiled form. Because of this preprocessing work, which is not needed in the case of an interpreting regular expression parser, this regular expression parser shows its efficiency only when you apply it to large input strings or if you are searching again and again for the same regular expression. A typical application which profits from the preprocessing needed by this parser is a utility which searches all files in a directory.

Limitations

Currently Unicode is not supported. There is no fundamental reason for this limitation and I think that a later release will correct this. I just did not yet find an efficient representation of a compiled regular expression which supports Unicode.

Constructing regular expressions

Regular expressions can be built from characters and special symbols. There are some similarities between regular expressions and arithmetic expressions. The most basic elements of arithmetic expressions are numbers and expressions enclosed in parens ( ). The most basic elements of regular expressions are characters, regular expressions enclosed in parens ( ) and character sets. On the next higher level, arithmetic expressions have '*' and '/' operators, whereas regular expressions have operators indicating the multiplicity of the preceding element.

Most basic elements of regular expressions

  • Individual characters. e.g. "h" is a regular expression. In the string "this home" it matches the beginning of 'home'. For non printable characters, one has to use either the notation \xhh where h means a hexadecimal digit or one of the escape sequences \n \r \t \v known from "C". Because the characters * + ? . | [ ] ( ) - $ ^ have a special meaning in regular expressions, escape sequences must also be used to specify these characters literally: \*  \+  \?  \.  \|  \[  \]  \(  \)  \-  \$  \^ . Furthermore, use '\ ' to indicate a space, because this implementation skips spaces in order to support a more readable style.
  • Character sets enclosed in square brackets [ ]. e.g. "[A-Za-z_$]" matches any alphabetic character, the underscore and the dollar sign (the dash (-) indicates a range), e.g. [A-Za-z$_] matches "B", "b", "_", "$" and so on. A ^ immediately following the [ of a character set means 'form the inverse character set'. e.g. "[^0-9A-Za-z]" matches non-alphanumeric characters.
  • Expressions enclosed in round parens ( ). Any regular expression can be used on the lowest level by enclosing it in round brackets.
  • the dot . It means 'match any character'.
  • an identifier prefixed by a $. It refers to an already defined regular expression. e.g. "$Ident" stands for a user defined regular expression previously defined. Think of it as a regular expression enclosed in round parens, which has a name.

Operators indicating the multiplicity of the preceding element

Any of the above five basic regular expressions can be followed by one of the special characters * + ? /i

  • * meaning repetition (possibly zero times); e.g. "[0-9]*" not only matches "8" but also "87576" and even the empty string "".
  • + meaning at least one occurrence; e.g. "[0-9]+" matches "8", "9185278", but not the empty string.
  • ? meaning at most one occurrence; e.g. "[$_A-Z]?" matches "_", "U", "$", .. and ""
  • \i meaning ignore case

Catenation of regular expressions

The regular expressions described above can be catenated to form longer regular expressions. E.g. "[_A-Za-z][_A-Za-z0-9]*" is a regular expression which matches any identifier of the programming language "C", namely the first character must be alphabetic or an underscore and the following characters must be alphanumeric or an underscore. "[0-9]*\.[0-9]+" describes a floating point number with an arbitrary number of digits before the decimal point and at least one digit following the decimal point. (The decimal point must be preceded by a backslash, otherwise the dot would mean 'accept any character at this place'). "(Hallo (,how are you\?)?)\i" matches "Hallo" as well as "Hallo, how are you?" in a case insensitive way.

Alternative regular expressions

Finally - on the top level - regular expressions can be separated by the | character. The two regular expressions on the left and right side of the | are alternatives, meaning that either the left expression or the right expression should match the source text. E.g. "[0-9]+ | [A-Za-z_][A-Za-z_0-9]*" matches either an integer or a "C"-identifier.

A complex example

The programming language "C" defines a floating point constant in the following way: A floating point constant has the following parts: An integer part, a decimal point, a fraction, an exponential part beginning with e or E followed by an optional sign and digits and an optional type suffix formed by one the characters f, F, l, L. Either the integer part or the fractional part can be absent (but not both). Either the decimal point or the exponential part can be absent (but not both).

The corresponding regular expression is quite complex, but it can be simplified by using the following definitions:

Collapse | Copy Code
$Int = "[0-9]+."
    $Frac= "\.[0-9]+".
    $Exp = "([Ee](\+|-)?[0-9]+)".

So we get the following expression for a floating point constant:

Collapse | Copy Code
($Int? $Frac $Exp?|$Int \. $Exp?|$Int $Exp)[fFlL]?

posted on 2013-01-08 19:45 不會飛的鳥 閱讀(392) 評論(0)  編輯 收藏 引用

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品
  • <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            一道本一区二区| 99re8这里有精品热视频免费 | 国产精品高潮粉嫩av| 99精品视频免费观看| 亚洲国产日韩一级| 免费在线欧美黄色| 亚洲乱亚洲高清| 日韩亚洲国产精品| 国产精品网站在线观看| 久久精品国产综合| 久久亚洲欧洲| 亚洲日本va午夜在线影院| 亚洲激情欧美| 欧美午夜精品一区| 久久精品欧美日韩| 亚洲人体大胆视频| 一区二区三区四区五区视频| 国产精品亚洲第一区在线暖暖韩国| 欧美一级电影久久| 久久综合久久综合九色| 99精品欧美一区| 亚洲欧美成人在线| 亚洲精品国产精品国自产在线| 91久久视频| 国产精品久久久久一区二区三区 | 雨宫琴音一区二区在线| 久久视频在线看| 欧美伦理一区二区| 久久都是精品| 欧美激情视频在线免费观看 欧美视频免费一 | 制服诱惑一区二区| 亚洲成色精品| 亚洲一区二区黄| 亚洲电影下载| 亚洲一卡久久| 亚洲精品孕妇| 久久se精品一区二区| 亚洲天堂免费在线观看视频| 久久精品国产一区二区三区| 亚洲网友自拍| 欧美凹凸一区二区三区视频| 欧美一区二区三区日韩视频| 欧美国产精品日韩| 香蕉久久夜色精品国产| 欧美日韩国产电影| 亚洲福利免费| 黄色成人av网| 欧美一区二区性| 欧美影院精品一区| 欧美亚洲不卡| 99在线视频精品| 99这里只有精品| 欧美freesex8一10精品| 久久躁日日躁aaaaxxxx| 国产婷婷色一区二区三区在线| 99精品视频免费观看| 夜夜嗨一区二区三区| 欧美激情国产精品| 亚洲国产二区| 亚洲精品少妇30p| 蜜桃av噜噜一区二区三区| 美女图片一区二区| 黄色精品一区| 91久久精品国产91性色tv| 国产精品午夜久久| 欧美激情亚洲视频| 国产视频在线观看一区二区三区| 亚洲色图在线视频| 亚洲国产精彩中文乱码av在线播放| 99pao成人国产永久免费视频| 国内免费精品永久在线视频| 日韩午夜av电影| 亚洲国产精品成人一区二区| 亚洲一级在线观看| 欧美一级久久久久久久大片| 欧美日韩精品一区二区| 亚洲欧美成人在线| 国产麻豆精品theporn| 最新日韩在线| 91久久久久久久久| 久久久久国产成人精品亚洲午夜| 亚洲免费影视第一页| 欧美激情在线| 亚洲国产成人午夜在线一区| 欧美女同视频| 亚洲欧美在线网| 欧美人与禽性xxxxx杂性| 欧美成人免费大片| 激情亚洲网站| 欧美在线一级va免费观看| 欧美一级网站| 国产麻豆精品久久一二三| 一区二区免费在线播放| 国产裸体写真av一区二区| 99精品国产热久久91蜜凸| 夜夜嗨av一区二区三区四季av| 另类专区欧美制服同性| 欧美www在线| 亚洲激情成人网| 久久久久久久性| 蜜臀va亚洲va欧美va天堂| 在线观看91久久久久久| 久久精品国产久精国产一老狼| 亚洲欧洲日产国产网站| 久久精品国产v日韩v亚洲| 久久九九全国免费精品观看| 国产综合久久久久久| 久久夜精品va视频免费观看| 欧美寡妇偷汉性猛交| 亚洲精品视频一区二区三区| 欧美日韩精品欧美日韩精品一| 91久久久久久国产精品| 在线综合视频| 欧美高清在线一区| 久久网站热最新地址| 亚洲第一在线视频| 欧美日韩八区| 午夜精品999| 美女91精品| 夜夜夜久久久| 国产区在线观看成人精品| 亚洲精品在线三区| 午夜精品一区二区三区在线| 国产日韩av在线播放| 久久久久网站| 久久夜色精品国产噜噜av| 99re热这里只有精品视频| 国产欧美日韩亚洲一区二区三区| 亚洲国产成人在线视频| 黄色成人片子| 欧美三日本三级少妇三2023| 欧美一区二区三区在线免费观看| 久久综合99re88久久爱| 亚洲图片你懂的| 一区二区在线观看视频在线观看| 久久久久久穴| 亚洲国产日韩一级| 久久er99精品| 国产一区二区三区奇米久涩| 久久久国产精品亚洲一区| 久久riav二区三区| 欧美精品在线观看| 欧美紧缚bdsm在线视频| 香蕉成人久久| 亚洲欧洲日产国产网站| 久久久久成人网| 亚洲一级电影| 一色屋精品亚洲香蕉网站| 噜噜噜躁狠狠躁狠狠精品视频 | 亚洲黄色大片| 亚洲一区二区成人| 在线免费观看欧美| 国产日产精品一区二区三区四区的观看方式 | 欧美大片免费看| 欧美一区二区三区久久精品茉莉花 | 在线免费观看日韩欧美| 国产酒店精品激情| 欧美日韩国产精品自在自线| 久久午夜羞羞影院免费观看| 最新中文字幕亚洲| 亚洲视频中文| 亚洲毛片播放| 亚洲欧洲在线免费| 国产一区成人| 国产欧美日韩视频| 国产精品视频一区二区三区| 久久久久久尹人网香蕉| 亚洲精品资源美女情侣酒店| 亚洲国产精品尤物yw在线观看 | 国产精品午夜视频| 国产精品国产福利国产秒拍| 欧美日韩福利视频| 欧美精品在线一区二区三区| 麻豆成人在线观看| 午夜在线电影亚洲一区| 久久婷婷色综合| 亚洲美女福利视频网站| 欧美在线播放一区| 久久精品国产999大香线蕉| 欧美亚洲三区| 久久久久久网址| 久久久亚洲精品一区二区三区| 中文一区字幕| 亚洲精品久久久久久久久| 亚洲人成网在线播放| 99re66热这里只有精品3直播 | 亚洲最新在线视频| 一区二区三区视频在线| 一区二区av在线| 亚洲性感美女99在线| 午夜亚洲福利在线老司机| 久久精品国产亚洲高清剧情介绍 | 国产精品久久久久久久久免费桃花 | 在线一区日本视频| 亚洲欧美在线aaa| 性做久久久久久久久| 老牛影视一区二区三区| 亚洲国产国产亚洲一二三| 夜夜爽99久久国产综合精品女不卡 | 欧美日韩免费一区二区三区|