[轉(zhuǎn)]URL編碼
URL編碼
作者: Chandrasekhar Vuppalapati
翻譯:eastvc
下載源代碼
本文的目的是設(shè)計(jì)一個(gè)完成URL編碼的C++類(lèi)。在我曾經(jīng)的項(xiàng)目中,我需要從VC++ 6.0應(yīng)用程序中POST數(shù)據(jù),而這些數(shù)據(jù)需要進(jìn)行URL編碼。我在MSDN中查找能根據(jù)提供的字符串生成URL編碼的相關(guān)類(lèi)或API,但我沒(méi)有找到,因此我必須設(shè)計(jì)一個(gè)自己的URLEncode C++類(lèi)。
URLEncoder.exe是一個(gè)使用URLEncode類(lèi)的MFC對(duì)話(huà)框程序。
如何處理
一些特殊字符在Internet上傳送是件棘手的事情, 經(jīng)URL編碼特殊處理,可以使所有字符安全地從Internet傳送。
例如,回車(chē)的ASCII值是13,在發(fā)送FORM數(shù)據(jù)時(shí)候這就認(rèn)為是一行數(shù)據(jù)的結(jié)束。
通常,所有應(yīng)用程序采用HTTP或HTTPS協(xié)議在客戶(hù)端和服務(wù)器端傳送數(shù)據(jù)。服務(wù)器端從客戶(hù)端接收數(shù)據(jù)有兩種基本方法:
1、數(shù)據(jù)可以從HTTP頭傳送(COOKIES或作為FORM數(shù)據(jù)發(fā)送)
2、可以包含在URL中的查詢(xún)部分
當(dāng)數(shù)據(jù)包含在URL,它必須遵循URL語(yǔ)法進(jìn)行編碼。在WEB服務(wù)器端,數(shù)據(jù)自動(dòng)解碼。考慮一下下面的URL,哪個(gè)數(shù)據(jù)是作為查詢(xún)參數(shù)。
例如:http://WebSite/ResourceName?Data=Data
WebSite是URL名稱(chēng)
ResourceName可以是ASP或Servlet名稱(chēng)
Data是需要發(fā)送的數(shù)據(jù)。如果MIME類(lèi)型是Content-Type: application/x-www-form-urlencoded,則要求進(jìn)行編碼。
RFC 1738
RFC 1738指明了統(tǒng)一資源定位(URLs)中的字符應(yīng)該是US-ASCII字符集的子集。這是受HTML的限制,另一方面,允許在文檔中使用所有ISO- 8859-1(ISO-Latin)字符集。這將意味著在HTML FORM里POST的數(shù)據(jù)(或作為查詢(xún)字串的一部分),所有HTML編碼必須被編碼。
ISO-8859-1 (ISO-Latin)字符集
在下表中,包含了完整的ISO-8859-1 (ISO-Latin)字符集,表格提供了每個(gè)字符范圍(10進(jìn)制),描述,實(shí)際值,十六進(jìn)制值,HTML結(jié)果。某個(gè)范圍中的字符是否安全。
Character range(decimal) | Type | Values | Safe/Unsafe |
0-31 | ASCII Control Characters | These characters are not printable | Unsafe |
32-47 | Reserved Characters | '' ''!?#$%&''()*+,-./ | Unsafe |
48-57 | ASCII Characters and Numbers | 0-9 | Safe |
58-64 | Reserved Characters | :;<=>?@ | Unsafe |
65-90 | ASCII Characters | A-Z | Safe |
91-96 | Reserved Characters | [\]^_` | Unsafe |
97-122 | ASCII Characters | a-z | Safe |
123-126 | Reserved Characters | {|}~ | Unsafe |
127 | Control Characters | '' '' | Unsafe |
128-255 | Non-ASCII Characters | '' '' | Unsafe |
所有不安全的ASCII字符都需要編碼,例如,范圍(32-47, 58-64, 91-96, 123-126)。
下表描述了這些字符為什么不安全。
Character | Unsafe Reason | Character Encode |
"<" | Delimiters around URLs in free text | %3C |
> | Delimiters around URLs in free text | %3E |
. | Delimits URLs in some systems | %22 |
# | It is used in the World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. | %23 |
{ | Gateways and other transport agents are known to sometimes modify such characters | %7B |
} | Gateways and other transport agents are known to sometimes modify such characters | %7D |
| | Gateways and other transport agents are known to sometimes modify such characters | %7C |
\ | Gateways and other transport agents are known to sometimes modify such characters | %5C |
^ | Gateways and other transport agents are known to sometimes modify such characters | %5E |
~ | Gateways and other transport agents are known to sometimes modify such characters | %7E |
[ | Gateways and other transport agents are known to sometimes modify such characters | %5B |
] | Gateways and other transport agents are known to sometimes modify such characters | %5D |
` | Gateways and other transport agents are known to sometimes modify such characters | %60 |
+ | Indicates a space (spaces cannot be used in a URL) | %20 |
/ | Separates directories and subdirectories | %2F |
? | Separates the actual URL and the parameters | %3F |
& | Separator between parameters specified in the URL | %26 |
如何實(shí)現(xiàn)
字符的URL編碼是將字符轉(zhuǎn)換到8位16進(jìn)制并在前面加上''%''前綴。例如,US-ASCII字符集中空格是10進(jìn)制
的32或16進(jìn)制的20,因此,URL編碼是%20。
URLEncode: URLEncode是一個(gè)C++類(lèi),來(lái)實(shí)現(xiàn)字符串的URL編碼。CURLEncode類(lèi)包含如下函數(shù):
isUnsafeString
decToHex
convert
URLEncode
URLEncode()函數(shù)完成編碼過(guò)程,URLEncode檢查每個(gè)字符,看是否安全。如果不安全將用%16進(jìn)制值進(jìn)行轉(zhuǎn)換并添加
到原始字符串中。
代碼片斷 :
class CURLEncode { private: static CString csUnsafeString; CString (char num, int radix); bool isUnsafe(char compareChar); CString convert(char val); public: CURLEncode() { }; virtual ~CURLEncode() { }; CString (CString vData); }; bool CURLEncode::isUnsafe(char compareChar) { bool bcharfound = false; char tmpsafeChar; int m_strLen = 0; m_strLen = csUnsafeString.GetLength(); for(int ichar_pos = 0; ichar_pos < m_strLen ;ichar_pos++) { tmpsafeChar = csUnsafeString.GetAt(ichar_pos); if(tmpsafeChar == compareChar) { bcharfound = true; break; } } int char_ascii_value = 0; //char_ascii_value = __toascii(compareChar); char_ascii_value = (int) compareChar; if(bcharfound == false && char_ascii_value > 32 && char_ascii_value < 123) { return false; } // found no unsafe chars, return false else { return true; } return true; } CString CURLEncode::decToHex(char num, int radix) { int temp=0; CString csTmp; int num_char; num_char = (int) num; if (num_char < 0) num_char = 256 + num_char; while (num_char >= radix) { temp = num_char % radix; num_char = (int)floor(num_char / radix); csTmp = hexVals[temp]; } csTmp += hexVals[num_char]; if(csTmp.GetLength() < 2) { csTmp += ''0''; } CString strdecToHex(csTmp); // Reverse the String strdecToHex.MakeReverse(); return strdecToHex; } CString CURLEncode::convert(char val) { CString csRet; csRet += "%"; csRet += decToHex(val, 16); return csRet; }
參考:
URL編碼: http://www.blooberry.com/indexdot/html/topics/urlencoding.htm.
RFC 1866: The HTML 2.0 規(guī)范 (純文本). 附錄包含了字符表: http://www.rfc-editor.org/rfc/rfc1866.txt.
Web HTML 2.0 版本(RFC 1866) : http://www.w3.org/MarkUp/html-spec/html-spec_13.html.
The HTML 3.2 (Wilbur) 建議: http://www.w3.org/MarkUp/Wilbur/.
The HTML 4.0 建議: http://www.w3.org/TR/REC-html40/.
W3C HTML 國(guó)際化區(qū)域: http://www.w3.org/International/O-HTML.html.
posted on 2007-01-04 13:46 永遇樂(lè) 閱讀(2643) 評(píng)論(3) 編輯 收藏 引用 所屬分類(lèi): 網(wǎng)絡(luò)