一直找PCRE的學(xué)習(xí)資料,網(wǎng)上沒(méi)有發(fā)現(xiàn)很全面的,回過(guò)頭了仔細(xì)看了一下PCRE源碼dochtml下的資料,發(fā)現(xiàn)其實(shí)這些文檔就是非常不錯(cuò)的學(xué)習(xí)材料。
今天看了一下如何使用PCRE,還沒(méi)有涉及到PCRE原理和實(shí)現(xiàn)的代碼。我們可以在http://www.pcre.org/上下載到pcre的代碼,下載到的源文件pcre-x.x.tar.bz2在linux下面很容易就可以被編譯和安裝(x86 系列cpu哦)。
./configure
make
make install
PCRE編譯安裝之后,以一個(gè)lib庫(kù)的方式提供給用戶程序進(jìn)行使用,PCRE lib 提供了一組API,通過(guò)這一組API可以實(shí)現(xiàn)類(lèi)似于Perl語(yǔ)法的正則表達(dá)式查找和匹配的功能。(PCREE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl, with just a few differences.)
要想使用好PCRE,要了解很多正則表達(dá)式的內(nèi)容、同時(shí)需要對(duì)PCRE進(jìn)行很多的配置,從而使其支持不同的模式和規(guī)格。在這里只是簡(jiǎn)單的描述一下使用PCRE的方法,不涉及配置和正則表達(dá)式語(yǔ)法的內(nèi)容。
使用PCRE主要是使用下面的四個(gè)函數(shù),對(duì)這四個(gè)函數(shù)有了了解,使用PCRE庫(kù)的時(shí)候就會(huì)簡(jiǎn)單很多。
pcre_compile() /pcre_compile2()
pcre_study()
pcre_exec()
1. pcre_compile() /pcre_compile2(), 正則表達(dá)式在使用之前要經(jīng)過(guò)編譯。
pcre *pcre_compile(const char *pattern, int options, const char **errptr, int *erroffset, const unsigned char *tableptr);
pcre *pcre_compile2(const char *pattern, int options, int *errorcodeptr, const char **errptr, int *erroffset, const unsigned char *tableptr);
編譯的目的是將正則表達(dá)式的pattern轉(zhuǎn)換成PCRE引擎能夠識(shí)別的結(jié)構(gòu)(struct real_pcre)。
還沒(méi)有對(duì)編譯的過(guò)程進(jìn)行分析.
2. pcre_study(),對(duì)編譯后的正則表達(dá)式結(jié)構(gòu)(struct real_pcre)進(jìn)行分析和學(xué)習(xí),學(xué)習(xí)的結(jié)果是一個(gè)數(shù)據(jù)結(jié)構(gòu)(struct pcre_extra),這個(gè)數(shù)據(jù)結(jié)構(gòu)連同編譯后的規(guī)則(struct real_pcre)可以一起送給pcre_exec單元進(jìn)行匹配.
If a compiled pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. The function pcre_study() takes a pointer to a compiled pattern as its first argument. If studying the pattern produces additional information that will help speed up matching, pcre_study() returns a pointer to a pcre_extra block, in which the study_data field points to the results of the study.
pcre_study()的引入主要是為了加速正則表達(dá)式匹配的速度.(為什么學(xué)習(xí)后就能加速呢?)這個(gè)還是比較有用的,可以將正則表達(dá)式編譯,學(xué)習(xí)后保存到一個(gè)文件或內(nèi)存中,這樣進(jìn)行匹配的時(shí)候效率比較搞.snort中就是這樣做的.
3. pcre_exec(),根據(jù)正則表達(dá)式到指定的字符串中進(jìn)行查找和匹配,并輸出匹配的結(jié)果.
The function pcre_exec() is called to match a subject string against a compiled pattern, which is passed in the code argument. If the pattern has been studied, the result of the study should be passed in the extra argument. This function is the main matching facility of the library, and it operates in a Perl-like manner.
4. Snort中如何使用PCRE呢?snort中以插件的形式調(diào)用PCRE進(jìn)行正則表達(dá)式的匹配。
1)進(jìn)行正則表達(dá)式的初始化。
InitializeDetection--> RegisterRules-->RegisterOneRule-->PCRESetup(Just for OPTION_TYPE_PCRE)->pcre_compile and pcre_study. All will be stored in a structure called PCREInfo in the memory.
2.) 規(guī)則的匹配。DetectionCheckRule-->ruleMatch-->ruleMatchInternal-->pcreMatch(OPTION_TYPE_PCRE)->pcre_test-->pcre_exec.
5.編譯PCRE on TILERA platform.
1) tar -xjvf pcre-7.9.tar.bz2
2) Modify config.sub to support tile architecture.
We wish to use DE>HOST=tileDE>, but the DE>tileDE> architecture is not yet standard, so may not exist in the DE>config.subDE> file. If necessary, add these lines in the alphabetical list of architectures (typically about 1,100 lines down):
tile*)
basic_machine=tile-tilera
os=-linux-gnu
;;
3) Compile PCRE on tile Linux.
** Start up TILERA card through tile-monitor.
tile-monitor --pci --mount-tile /usr \
--mount-tile /bin --mount-tile /sbin --mount-tile /etc --mount-tile /lib \
--mkdir /mnt/libs --mount /libs-compile /mnt/libs \
--mkdir /mnt/mde --mount $TILERA_ROOT /mnt/mde
* ./configure --build=tile --prefix=/usr lt_cv_sys_max_cmd_len=262144 --disable-cpp
//編譯的時(shí)候沒(méi)有使能c++的支持。
pcre-7.9 configuration summary:
pcre-7.9 configuration summary:
Install prefix .................. : /usr
C preprocessor .................. : gcc -E
C compiler ...................... : gcc
C++ preprocessor ................ : g++ -E
C++ compiler .................... : g++
Linker .......................... : /usr/bin/ld
C preprocessor flags ............ :
C compiler flags ................ : -O2
C++ compiler flags .............. : -O2
Linker flags .................... :
Extra libraries ................. :
Build C++ library ............... : no
Enable UTF-8 support ............ : no
Unicode properties .............. : no
Newline char/sequence ........... : lf
\R matches only ANYCRLF ......... : no
EBCDIC coding ................... : no
Rebuild char tables ............. : no
Use stack recursion ............. : yes
POSIX mem threshold ............. : 10
Internal link size .............. : 2
Match limit ..................... : 10000000
Match limit recursion ........... : MATCH_LIMIT
Build shared libs ............... : yes
Build static libs ............... : yes
Link pcregrep with libz ......... : no
Link pcregrep with libbz2 ....... : no
Link pcretest with libreadline .. : no
* make
* make install
4) Compile the PCRE demo code and test PCRE lib on TILERA linux. PCRE 的源文件中提供了兩個(gè)demo程序,一個(gè)是比較簡(jiǎn)單的pcredemo.c,很容易理解;另外一個(gè)是pcretest.c,這個(gè)比較全面、完整的介紹了pcre庫(kù)的使用。這兩個(gè)demo本身就是非常好的學(xué)習(xí)材料。
# gcc -o pcredemo pcredemo.c -lpcre
# ./pcredemo 'cat|dog' 'the cat sat on the mat'
Match succeeded at offset 4
0: cat
No named substrings
# ./pcredemo -g 'cat|dog' 'the dog sat on the cat'
Match succeeded at offset 4
0: dog
No named substrings
Match succeeded again at offset 19
0: cat
No named substrings
//參考資料:
PCRE源碼文檔:pcre-7.9/doc/html