久久99精品国产麻豆不卡,成人综合伊人五月婷久久,久久久久国色AV免费观看

使用正則表達(dá)式解析URL

   在開發(fā)HTTP相關(guān)程序時，經(jīng)常會碰到從網(wǎng)絡(luò)鏈接URL中提取協(xié)議名、服務(wù)器、路徑等目標(biāo)對象，如果使用C/C++字符串操作函數(shù)，那么則顯得有點麻煩且代碼不易維護，其實關(guān)于文本內(nèi)容的解析工作，都可優(yōu)先考慮使用正則表達(dá)式庫來解決處理，C++方面的正則庫也有很多種，如atl、pcre、boost。下面就使用boost中的regex來解析URL提取協(xié)議名、服務(wù)器、路徑為目標(biāo)說明其用法。

協(xié)議名
   可有可無，如果有時則后面必跟著://，如果沒有，則默認(rèn)為使用http協(xié)議。通常還有其它的協(xié)議如https、ssl、ftp、mailto等。因此匹配協(xié)議名的正則表達(dá)式應(yīng)該是(?:(mailto|ssh|ftp|https?)://)?，注意這個表達(dá)式本身捕獲了協(xié)議名，但不包括://。

服務(wù)器
   或是域名，如www.csdn.net；或是IP地址，如192.168.1.1，可帶端口號，如192.168.1.1:8080。匹配域名的正則表達(dá)式為(?:[a-z0-9](?:[-a-z0-9]*[a-z0-9])?\.)+(?:com|net|edu|biz|gov|org|in(?:t|fo)|(?-i:[a-z][a-z]))，表達(dá)式"(?:com|net|edu|biz|gov|org|in(?:t|fo)"匹配了com、net、edu、biz、gov、org、int、info等常見的域名，而(?-i:[a-z][a-z])匹配了國家代碼，而且只允許小寫為合法的，如www.richcomm.com.cn。匹配IP要盡量精確，考慮到IP每部分應(yīng)為數(shù)字且范圍在0-255之間，因此表達(dá)式應(yīng)為(?:[01]?\d\d?|2[0-4]\d|25[0-5])\.(?:[01]?\d\d?|2[0-4]\d|25[0-5])\.(?:[01]?\d\d?|2[0-4]\d|25[0-5])\.(?:[01]?\d\d?|2[0-4]\d|25[0-5])。注意以上域名或IP的正則式本身不捕獲它們，這是為了留在后面作為整體捕獲。
   端口號的正則表達(dá)式為(?::(\d{1,5}))?，這里限制了端口號為1至5位的數(shù)字，更精確的匹配如要求在某范圍如[1024,65535]間則可參考以上IP正則模式。綜上所得，匹配服務(wù)器的正則表達(dá)式為((?:(?:[a-z0-9](?:[-a-z0-9]*[a-z0-9])?\.)+(?:com|net|edu|biz|gov|org|in(?:t|fo)|(?-i:[a-z][a-z]))|(?:[01]?\d\d?|2[0-4]\d|25[0-5])\.(?:[01]?\d\d?|2[0-4]\d|25[0-5])\.(?:[01]?\d\d?|2[0-4]\d|25[0-5])\.(?:[01]?\d\d?|2[0-4]\d|25[0-5])))(?::(\d{1,5}))?，這個正則式作為整體捕獲了域名或IP，及端口號（若有），如www.csdn.net，則得到www.csdn.net和空（沒有端口，http默認(rèn)為80，https默認(rèn)為443）子串；192.168.1.1:8080則得到192.168.1.1和8080子串。

路徑
最簡單的形式為(/.*)?，更精確的形式為/[^.!,?;"'<>()\[\]{}\s\x7F-\xFF]*(?:[.!,?]+[^.!,?;"'<>()\[\]{}\s\x7F-\xFF]+)*。

   以上所有正則表達(dá)式均為ascii字符集，對于unicode字符集則在其前加L即可。

   為方便使用，封裝成了兩個自由模板函數(shù)，如下所示

template<typename charT>
2

inline bool boost_match(const charT* pattern,const charT* text,unsigned int flags=boost::regex::normal,boost::match_results<const charT*>* result=NULL)
3

{
4

boost::basic_regex<charT,boost::regex_traits<charT> > expression(pattern,flags);
5

if(NULL==result)
6

return boost::regex_match(text,expression);
7

return boost::regex_match(text,*result,expression);
8

}
9

template<typename charT>
11

inline bool boost_search(const charT* pattern,const charT* text,unsigned int flags=boost::regex::normal,boost::match_results<const charT*>* result=NULL)
12

{
13

boost::basic_regex<charT,boost::regex_traits<charT> > expression(pattern,flags);
14

if(NULL==result)
15

return boost::regex_search(text,expression);
16

return boost::regex_search(text,*result,expression);
17

}

測試示例如下

static const string protocol = "(?:(mailto|ssh|ftp|https?)://)?";
2

static const string hostname = "(?:[a-z0-9](?:[-a-z0-9]*[a-z0-9])?\\.)+(?:com|net|edu|biz|gov|org|in(?:t|fo)|(?-i:[a-z][a-z]))";
3

static const string ip = "(?:[01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.(?:[01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.(?:[01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.(?:[01]?\\d\\d?|2[0-4]\\d|25[0-5])";
4

static const string port = "(?::(\\d{1,5}))?";
5

static const string path = "(/.*)?";
6

static const string pattern = protocol + "((?:" + hostname + "|" + ip + "))" + port + path;
7

int _tmain(int argc, _TCHAR* argv[])
9

{
10

using namespace boost;
11

//形式1: 帶協(xié)議名,服務(wù)器為名稱,不帶端口號
13

bool ret;
14

string text = "http://www.shnenglu.com/qinqing1984";
15

boost::cmatch what;
16

ret=boost_match(pattern.c_str(),text.c_str(),regex::icase|regex::perl,&what);
17

assert(ret);
18

assert(what[1].str()=="http");
19

assert(what[2].str()=="www.shnenglu.com");
20

assert(what[3].str()=="");
21

assert(what[4].str()=="/qinqing1984");
22

//形式2: 不帶協(xié)議名,服務(wù)器為名稱,帶端口號
24

text = "www.shnenglu.com:80/qinqing1984";
25

ret=boost_match(pattern.c_str(),text.c_str(),regex::icase|regex::perl,&what);
26

assert(ret);
27

assert(what[1].str()=="");
28

assert(what[2].str()=="www.shnenglu.com");
29

assert(what[3].str()=="80");
30

assert(what[4].str()=="/qinqing1984");
31

//形式3: 不帶協(xié)議名,服務(wù)器為名稱,不帶路徑
33

text = "www.shnenglu.com:80";
34

ret=boost_match(pattern.c_str(),text.c_str(),regex::icase|regex::perl,&what);
35

assert(ret);
36

assert(what[1].str()=="");
37

assert(what[2].str()=="www.shnenglu.com");
38

assert(what[3].str()=="80");
39

assert(what[4].str()=="");
40

//形式4: 協(xié)議為https,服務(wù)器為IP,帶端口號
42

text = "https://192.168.1.1:443/index.html";
43

ret=boost_match(pattern.c_str(),text.c_str(),regex::icase|regex::perl,&what);
44

assert(ret);
45

assert(what[1].str()=="https");
46

assert(what[2].str()=="192.168.1.1");
47

assert(what[3].str()=="443");
48

assert(what[4].str()=="/index.html");
49

//形式5: 端口超過5位數(shù)
51

text = "ftp://192.168.1.1:888888";
52

ret=boost_match(pattern.c_str(),text.c_str(),regex::icase|regex::perl,&what);
53

assert(!ret);
54

//形式6: 沒有協(xié)議名
56

text = "//192.168.1.1/index.html";
57

ret=boost_match(pattern.c_str(),text.c_str(),regex::icase|regex::perl,&what);
58

assert(!ret);
59

//形式7: 沒有服務(wù)器
61

text = "http:///index.html";
62

ret=boost_match(pattern.c_str(),text.c_str(),regex::icase|regex::perl,&what);
63

assert(!ret);
64

//形式8: 不合法的服務(wù)器
66

text = "cppblog/index.html";
67

ret=boost_match(pattern.c_str(),text.c_str(),regex::icase|regex::perl,&what);
68

assert(!ret);
69

return 0;
71

}

對URL的解析，因時間有限，本文所述不盡詳細(xì)，只是略作分析，以點帶面，更多的精確匹配則依賴于實際的應(yīng)用需求。

posted on 2011-11-27 17:22 春秋十二月閱讀(7952) 評論(5) 編輯收藏引用所屬分類: Opensrc

只有注冊用戶登錄后才能發(fā)表評論。
【推薦】100%開源！大型工業(yè)跨平臺軟件C++源碼提供，建模，組態(tài)！

相關(guān)文章: 一種擴展nginx支持windows服務(wù)的方法一種統(tǒng)計云查詢黑文件的方法及系統(tǒng) nginx iocp（3）：scm服務(wù)控制 nginx iocp（2）：udp異步接收 nginx iocp（1）：tcp異步連接面向?qū)ο箧i框架的設(shè)計與實現(xiàn) 基于boost asio實現(xiàn)的ssl socket框架使用正則表達(dá)式解析URL 基于stl序列容器實現(xiàn)的通用集合類 (線程安全版) 基于stl序列容器實現(xiàn)的通用集合類

網(wǎng)站導(dǎo)航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理

常用鏈接

留言簿(75)

隨筆分類(160)

隨筆檔案(161)

文章分類(30)

關(guān)注的開源項目

最新隨筆

積分與排名

最新評論

閱讀排行榜

評論排行榜