青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

不會(huì)飛的鳥(niǎo)

2010年12月10日 ... 不鳥(niǎo)他們!!! 我要用自己開(kāi)發(fā)的分布式文件系統(tǒng)、分布式調(diào)度系統(tǒng)、分布式檢索系統(tǒng), 做自己的搜索引擎!!!大魚(yú)有大志!!! ---楊書(shū)童

一款簡(jiǎn)單的正則表達(dá)式處理模塊([轉(zhuǎn)]Fast-regular-expressions)

原文出處:http://www.codeproject.com/Articles/798/Fast-regular-expressions

Fast regular expressions

By , 29 Oct 2000
 

Sample Image - RexSearch.jpg

Introduction

Regular expressions are a well recognized way for describing string patterns. The following regular expression defines a floating point number with a (possibly empty) integer part, a non empty fractional part and an optional exponent:

Collapse | Copy Code
[0-9]* \.[0-9]+ ([Ee](\+|-)?[0-9]+)?

The rules for interpreting and constructing such regular expressions are explained below. A regular expression parser takes a regular expression and a source string as arguments and returns the source position of the first match. Regular expression parsers either interpret the search pattern at runtime or they compile the regular expression into an efficient internal form (known as deterministic finite automaton). The regular expression parser described here belongs to the second category. Besides being quite fast, it also supports dictionaries of regular expressions. With the definitions $Int= [0-9], $Frac= \.[0-9]+ and $Exp= ([Ee](\+|-)?[0-9]+), the above regular expression for a floating point number can be abbreviated to $Int* $Frac $Exp?.

Interface

I separated algorithmic from interface issues. The files RexAlgorithm.h and RexAlgorithm.cpp implement the regular expression parser using only standard C++ (relying on STL), whereas the file RexInterface.h and RexInterface.cpp contain the interfaces for the end user. Currently there is only one interface, implemented in the class REXI_Search. Interfaces for replace functionality and for programming language scanners are planned for future releases.

Collapse | Copy Code
struct REXI_DefErr{
    enum{eNoErr,eErrInName,eErrInRegExp} eErrCode;
    string  strErrMsg;
    int     nErrOffset;
    };
    class REXI_Search : public REXI_Base
    {
    public:
    REXI_Search(char cEos='\0');
    REXI_DefErr
    AddRegDef   (string strName,string strRegExp);
    inline  REXI_DefErr
    SetRegexp  (string strRegExp);
    bool    MatchHere   (const char*& rpcszSrc, int& nMatchLen,bool& bEos);
    bool    Find        (const char*& rpcszSrc, int& nMatchLen,bool& bEos);
    private:
    bool    MatchHereImpl();
    int     m_nIdAnswer;
    };

Example usage

Collapse | Copy Code
int main(int argc, char* argv[])
    {
    const char szTestSrc[]= "3.1415 is the same as 31415e-4";
    const int ncOk= REXI_DefErr::eNoErr;
    REXI_Search rexs;
    REXI_DefErr err;
    err= rexs.AddRegDef("$Int","[0-9]+");  assert(err.eErrCode==ncOk);
    err= rexs.AddRegDef("$Frac","\\.[0-9]+"); assert(err.eErrCode==ncOk);
    err= rexs.AddRegDef("$Exp","([Ee](\\+|-)?[0-9]+)");
    assert(err.eErrCode==ncOk);
    err= rexs.SetRegexp("($Int? $Frac $Exp?|$Int \\. $Exp?|$Int $Exp)[fFlL]?");
    assert(err.eErrCode==ncOk);
    const char*     pCur= szTestSrc;
    int             nMatchLen;
    bool            bEosFound= false;
    cout    <<  "Source text is: \""    <<  szTestSrc   << "\"" <<  endl;
    while(rexs.Find(pCur,nMatchLen,bEosFound)){
    cout <<  "Floating point number found  at position "
    <<  ((pCur-szTestSrc)-nMatchLen)
    <<  " having length "  <<  nMatchLen  <<  endl;
    }
    int i;
    cin >> i;
    return 0;
    }

Performance issues

A call to the member function REXI_Search::SetRegexp(strRegExp)involves quite a lot of computing. The regular expression strRegExp is analyzed and after several steps transformed into a compiled form. Because of this preprocessing work, which is not needed in the case of an interpreting regular expression parser, this regular expression parser shows its efficiency only when you apply it to large input strings or if you are searching again and again for the same regular expression. A typical application which profits from the preprocessing needed by this parser is a utility which searches all files in a directory.

Limitations

Currently Unicode is not supported. There is no fundamental reason for this limitation and I think that a later release will correct this. I just did not yet find an efficient representation of a compiled regular expression which supports Unicode.

Constructing regular expressions

Regular expressions can be built from characters and special symbols. There are some similarities between regular expressions and arithmetic expressions. The most basic elements of arithmetic expressions are numbers and expressions enclosed in parens ( ). The most basic elements of regular expressions are characters, regular expressions enclosed in parens ( ) and character sets. On the next higher level, arithmetic expressions have '*' and '/' operators, whereas regular expressions have operators indicating the multiplicity of the preceding element.

Most basic elements of regular expressions

  • Individual characters. e.g. "h" is a regular expression. In the string "this home" it matches the beginning of 'home'. For non printable characters, one has to use either the notation \xhh where h means a hexadecimal digit or one of the escape sequences \n \r \t \v known from "C". Because the characters * + ? . | [ ] ( ) - $ ^ have a special meaning in regular expressions, escape sequences must also be used to specify these characters literally: \*  \+  \?  \.  \|  \[  \]  \(  \)  \-  \$  \^ . Furthermore, use '\ ' to indicate a space, because this implementation skips spaces in order to support a more readable style.
  • Character sets enclosed in square brackets [ ]. e.g. "[A-Za-z_$]" matches any alphabetic character, the underscore and the dollar sign (the dash (-) indicates a range), e.g. [A-Za-z$_] matches "B", "b", "_", "$" and so on. A ^ immediately following the [ of a character set means 'form the inverse character set'. e.g. "[^0-9A-Za-z]" matches non-alphanumeric characters.
  • Expressions enclosed in round parens ( ). Any regular expression can be used on the lowest level by enclosing it in round brackets.
  • the dot . It means 'match any character'.
  • an identifier prefixed by a $. It refers to an already defined regular expression. e.g. "$Ident" stands for a user defined regular expression previously defined. Think of it as a regular expression enclosed in round parens, which has a name.

Operators indicating the multiplicity of the preceding element

Any of the above five basic regular expressions can be followed by one of the special characters * + ? /i

  • * meaning repetition (possibly zero times); e.g. "[0-9]*" not only matches "8" but also "87576" and even the empty string "".
  • + meaning at least one occurrence; e.g. "[0-9]+" matches "8", "9185278", but not the empty string.
  • ? meaning at most one occurrence; e.g. "[$_A-Z]?" matches "_", "U", "$", .. and ""
  • \i meaning ignore case

Catenation of regular expressions

The regular expressions described above can be catenated to form longer regular expressions. E.g. "[_A-Za-z][_A-Za-z0-9]*" is a regular expression which matches any identifier of the programming language "C", namely the first character must be alphabetic or an underscore and the following characters must be alphanumeric or an underscore. "[0-9]*\.[0-9]+" describes a floating point number with an arbitrary number of digits before the decimal point and at least one digit following the decimal point. (The decimal point must be preceded by a backslash, otherwise the dot would mean 'accept any character at this place'). "(Hallo (,how are you\?)?)\i" matches "Hallo" as well as "Hallo, how are you?" in a case insensitive way.

Alternative regular expressions

Finally - on the top level - regular expressions can be separated by the | character. The two regular expressions on the left and right side of the | are alternatives, meaning that either the left expression or the right expression should match the source text. E.g. "[0-9]+ | [A-Za-z_][A-Za-z_0-9]*" matches either an integer or a "C"-identifier.

A complex example

The programming language "C" defines a floating point constant in the following way: A floating point constant has the following parts: An integer part, a decimal point, a fraction, an exponential part beginning with e or E followed by an optional sign and digits and an optional type suffix formed by one the characters f, F, l, L. Either the integer part or the fractional part can be absent (but not both). Either the decimal point or the exponential part can be absent (but not both).

The corresponding regular expression is quite complex, but it can be simplified by using the following definitions:

Collapse | Copy Code
$Int = "[0-9]+."
    $Frac= "\.[0-9]+".
    $Exp = "([Ee](\+|-)?[0-9]+)".

So we get the following expression for a floating point constant:

Collapse | Copy Code
($Int? $Frac $Exp?|$Int \. $Exp?|$Int $Exp)[fFlL]?

posted on 2013-01-08 19:45 不會(huì)飛的鳥(niǎo) 閱讀(392) 評(píng)論(0)  編輯 收藏 引用


只有注冊(cè)用戶登錄后才能發(fā)表評(píng)論。
網(wǎng)站導(dǎo)航: 博客園   IT新聞   BlogJava   博問(wèn)   Chat2DB   管理


青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品
  • <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            欧美在线亚洲| 久久一二三四| 一区二区三区国产精品| 欧美日韩国产区一| 亚洲一区二区三区在线播放| 亚洲视频一区| 国产一区二区黄色| 欧美激情一二三区| 欧美视频在线一区二区三区| 亚洲欧美日韩成人| 欧美影院成年免费版| 亚洲高清av在线| 亚洲精品黄色| 国产日韩欧美视频在线| 久久综合伊人77777| 欧美sm重口味系列视频在线观看| 日韩午夜av在线| 亚洲综合第一页| 亚洲电影中文字幕| 一本色道久久综合| 一区二区三区在线免费观看| 亚洲精品视频在线| 尤物九九久久国产精品的分类| 蜜臀av一级做a爰片久久| 欧美日韩成人一区二区| 性欧美1819性猛交| 久久久福利视频| 亚洲视频国产视频| 久久久99精品免费观看不卡| 亚洲另类一区二区| 欧美一级视频| 一本大道久久精品懂色aⅴ| 欧美一区二区三区在线看| 亚洲精品永久免费| 久久av在线| 亚洲欧美在线免费| 欧美激情网站在线观看| 免播放器亚洲一区| 国产精品视频第一区| 亚洲高清在线播放| 国内自拍视频一区二区三区| 99re6热在线精品视频播放速度| 国产在线拍偷自揄拍精品| 亚洲激情欧美激情| 在线观看欧美亚洲| 亚洲一区久久| 99国产精品99久久久久久粉嫩| 香蕉视频成人在线观看| av成人免费| 快she精品国产999| 久久久久99| 国产欧美三级| 亚洲欧美色婷婷| 欧美一区二区三区四区在线观看地址| 欧美高清在线视频| 免费视频亚洲| 亚洲大胆女人| 久久精品亚洲一区二区| 久久综合给合久久狠狠狠97色69| 国产精品一区二区黑丝| av成人福利| 亚洲欧美日韩精品久久久| 国产精品久久精品日日| 中日韩美女免费视频网址在线观看 | 亚洲精品日韩久久| 狂野欧美一区| 亚洲福利在线视频| 亚洲麻豆视频| 欧美日韩精品国产| 亚洲视频久久| 欧美一区影院| 国产自产女人91一区在线观看| 久久国产加勒比精品无码| 久久午夜av| 亚洲三级免费电影| 国产精品扒开腿做爽爽爽视频| 亚洲社区在线观看| 久久国产精品免费一区| 一区在线播放视频| 欧美高清免费| 在线亚洲自拍| 久久亚洲私人国产精品va媚药 | 国产欧美一区二区三区久久| 欧美亚洲视频一区二区| 女女同性精品视频| 99re8这里有精品热视频免费| 欧美日韩一区二区在线观看视频 | 亚洲盗摄视频| 亚洲一区bb| 国内自拍视频一区二区三区| 蜜桃久久av一区| 在线一区免费观看| 久色成人在线| 亚洲校园激情| 伊人久久噜噜噜躁狠狠躁| 欧美精品激情在线| 亚洲欧洲av一区二区| 欧美高清影院| 欧美一区二区三区婷婷月色| 最新日韩中文字幕| 国产欧美高清| 欧美激情在线免费观看| 性色av一区二区三区红粉影视| 欧美高清在线播放| 欧美制服丝袜| 99精品视频免费观看| 麻豆91精品| 欧美在线视频观看免费网站| 亚洲欧洲视频在线| 黑丝一区二区三区| 国产精品麻豆va在线播放| 男男成人高潮片免费网站| 午夜欧美不卡精品aaaaa| 最新日韩在线视频| 久久综合国产精品| 午夜欧美不卡精品aaaaa| 亚洲精品视频在线| 精品动漫3d一区二区三区免费版| 国产精品高清免费在线观看| 欧美激情欧美狂野欧美精品| 久久九九99视频| 性视频1819p久久| 一本色道久久综合亚洲精品不卡 | 一区二区av在线| 亚洲第一区在线观看| 国产主播精品在线| 国产深夜精品福利| 国产欧美日韩伦理| 国产精品入口| 国产精品私房写真福利视频| 欧美视频中文一区二区三区在线观看 | 欧美视频免费在线观看| 欧美激情一区二区三级高清视频| 久久亚洲一区| 免费观看一区| 免费成人性网站| 女主播福利一区| 美女在线一区二区| 欧美成年视频| 欧美激情欧美狂野欧美精品| 欧美精品一区二区精品网| 欧美精品激情在线观看| 欧美日韩国产va另类| 欧美日韩精品欧美日韩精品| 欧美日韩国产综合视频在线观看中文| 欧美v亚洲v综合ⅴ国产v| 免费成人在线观看视频| 欧美高清视频一二三区| 欧美日韩亚洲一区二| 欧美无乱码久久久免费午夜一区| 欧美日韩午夜视频在线观看| 欧美日韩一区二区在线观看视频| 欧美午夜宅男影院在线观看| 国产精品剧情在线亚洲| 国产一区二区三区在线观看免费| 激情综合久久| 亚洲免费高清| 亚洲免费视频在线观看| 久久久久久9999| 亚洲高清视频的网址| 亚洲久色影视| 性久久久久久久久久久久| 久久久久国产精品人| 欧美国产一区二区| 国产精品久久久久久久午夜片 | 国产精品超碰97尤物18| 国产区亚洲区欧美区| 在线观看亚洲视频| 日韩视频中午一区| 久久国产精品一区二区三区四区| 免费在线观看一区二区| 亚洲免费观看高清完整版在线观看| 宅男精品视频| 久久综合成人精品亚洲另类欧美| 欧美日韩三级视频| 国产自产精品| 亚洲免费在线播放| 欧美激情影音先锋| 亚洲欧美激情视频在线观看一区二区三区| 久久久777| 国产精品第十页| 亚洲国产欧美国产综合一区| 亚洲欧美中日韩| 欧美激情亚洲| 久久激情视频| 国产精品亚洲аv天堂网| 亚洲精选在线观看| 久久色在线观看| 亚洲香蕉成视频在线观看| 欧美fxxxxxx另类| 国产一区二区久久| 亚洲欧美在线aaa| 亚洲国产日韩在线一区模特| 欧美一区91| 国产精品美女久久福利网站| 亚洲毛片一区| 男男成人高潮片免费网站| 欧美一级视频一区二区| 欧美午夜激情视频| 一本综合久久|