锘??xml version="1.0" encoding="utf-8" standalone="yes"?>
鎴戞兂澶у鐪嬬湅鎴戠殑浠g爜,緇欐垜涓鐐規剰瑙佸拰寤鴻.
]]>
浣嗘兂浜嗗嚑澶╀簡,閮芥病鏈変竴涓悎閫傜殑鏂規鏉ュ疄鐜?
鍞?.....
浠婂ぉ鍏堣瘯鐫鍐欎簡鍐?鎵炬壘鎰熻,鎺ョ潃鍐嶆兂鍚?..
U+00000000 - U+0000007F: | 0 xxxxxxx | 0x - 7x | |
U+00000080 - U+000007FF: | 110 xxxxx 10 xxxxxx | Cx 8x - Dx Bx | |
U+00000800 - U+0000FFFF: | 1110 xxxx 10 xxxxxx 10 xxxxxx | Ex 8x 8x - Ex Bx Bx | |
U+00010000 - U+001FFFFF: | 11110 xxx 10 xxxxxx 10 xxxxxx 10 xxxxxx | F0 8x 8x 8x - F7 Bx Bx Bx | 寰堝皯鐢?/td> |
U+00200000 - U+03FFFFFF: | 111110 xx 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx | F8 8x 8x 8x 8x - FB Bx Bx Bx Bx | |
U+04000000 - U+7FFFFFFF: | 1111110 x 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx | FC 8x 8x 8x 8x 8x - FD Bx Bx Bx Bx Bx |
* FE FF浠庢湭鍦ㄧ紪鐮佷腑鍑虹幇榪?
* 闄ょ涓涓瓧鑺傚,鍏朵綑瀛楄妭閮藉湪
0x80 鍒?0xBF鑼冨洿鍐?姣忎釜瀛楃鐨勮搗濮嬩綅緗敤0xC0-0xD0,0xE0,0xF0絳夊彲浠ョ‘瀹?楠岃瘉鍓嶅洓浣嶆垨鍏綅),涓嶅湪榪欎竴鑼冨洿鐨勫嵆涓哄崟瀛楄妭瀛楃.鍑℃槸浠?span style="color: rgb(153, 0, 0); font-weight: bold;">0x80 鍒?0xBF寮澶寸殑閮芥槸鍚庣戶瀛楄妭,璁℃暟鏃墮兘瑕佽煩榪?
* Unicode鏄竴縐嶇紪鐮佽〃,鍙皢瀛楃鎸囧畾緇欐煇涓鏁板瓧(Unicode鍋氬緱榪樿鏇村涓浜?姣斿鎻愪緵姣旇緝鍙婃樉紺虹瓑寰堝綆楁硶絳夌瓑);
鑰孶TF-8鏄紪鐮佹柟寮?鏄畾涔夊浣曡〃紺哄茍瀛樺偍鎸囧畾緙栫爜鐨勬牸寮?
* UTF-8緙栫爜杞崲涓篣nicode緙栫爜: 灝嗘墍鏈夋爣蹇椾綅鍘婚櫎,鍓╀綑浣嶆暟鑻ヤ笉瓚沖垯鍦ㄩ珮浣嶈ˉ闆?鍑戣凍32浣嶅嵆鍙?
* Unicode緙栫爜杞崲涓篣TF-8緙栫爜: 浠庝綆浣嶅紑濮?姣忓彇6浣嶈ˉ涓や釜浣?0,涓嶈凍6浣?涓嶇畻楂樹綅鐨?)鍒欐寜瀛楄妭闀垮害琛ョ浉搴旂殑瀛楃鏍囧織浣?銆?10銆?110絳?/font>
[鏉ユ簮: http://icu.sourceforge.net/docs/papers/forms_of_unicode/]
UTF-8(ISO 10646-1) 鏈変互涓嬬壒鎬?
涓嬪垪瀛楄妭涓茬敤鏉ヨ〃紺轟竴涓瓧絎? 鐢ㄥ埌鍝釜涓插彇鍐充簬璇ュ瓧絎﹀湪 Unicode 涓殑搴忓彿.
U-00000000 - U-0000007F: | 0xxxxxxx |
U-00000080 - U-000007FF: | 110xxxxx 10xxxxxx |
U-00000800 - U-0000FFFF: | 1110xxxx 10xxxxxx 10xxxxxx |
U-00010000 - U-001FFFFF: | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
U-00200000 - U-03FFFFFF: | 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
U-04000000 - U-7FFFFFFF: | 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
xxx 鐨勪綅緗敱瀛楃緙栫爜鏁扮殑浜岃繘鍒惰〃紺虹殑浣嶅~鍏? 瓚婇潬鍙崇殑 x 鍏鋒湁瓚婂皯鐨勭壒孌婃剰涔? 鍙敤鏈鐭殑閭d釜瓚沖琛ㄨ揪涓涓瓧絎︾紪鐮佹暟鐨勫瀛楄妭涓? 娉ㄦ剰鍦ㄥ瀛楄妭涓蹭腑, 絎竴涓瓧鑺傜殑寮澶?1"鐨勬暟鐩氨鏄暣涓覆涓瓧鑺傜殑鏁扮洰.
渚嬪: Unicode 瀛楃 U+00A9 = 1010 1001 (鐗堟潈絎﹀彿) 鍦?UTF-8 閲岀殑緙栫爜涓?
11000010 10101001 = 0xC2 0xA9
鑰屽瓧絎?U+2260 = 0010 0010 0110 0000 (涓嶇瓑浜? 緙栫爜涓?
11100010 10001001 10100000 = 0xE2 0x89 0xA0
榪欑緙栫爜鐨勫畼鏂瑰悕瀛楁嫾鍐欎負 UTF-8, 鍏朵腑 UTF 浠h〃 UCS Transformation Format. 璇峰嬁鍦ㄤ換浣曟枃妗d腑鐢ㄥ叾浠栧悕瀛?(姣斿 utf8 鎴?UTF_8) 鏉ヨ〃紺?UTF-8, 褰撶劧闄ら潪浣犳寚鐨勬槸涓涓彉閲忓悕鑰屼笉鏄繖縐嶇紪鐮佹湰韜?
鍦ㄥぇ綰?1993 騫翠箣鍚庡紑鍙戠殑澶у鏁扮幇浠g紪紼嬭璦閮芥湁涓涓壒鍒殑鏁版嵁綾誨瀷, 鍙仛 Unicode/ISO 10646-1 瀛楃. 鍦?Ada95 涓彨 Wide_Character, 鍦?Java 涓彨 char.
ISO C 涔熻緇嗚鏄庝簡澶勭悊澶氬瓧鑺傜紪鐮佸拰瀹藉瓧絎?(wide characters) 鐨勬満鍒?
1994 騫?9 鏈?Amendment 1 to ISO C
鍙戣〃鏃跺張鍔犲叆浜嗘洿澶? 榪欎簺鏈哄埗涓昏鏄負鍚勭被涓滀簹緙栫爜鑰岃璁$殑,
瀹冧滑姣斿鐞?UCS 鎵闇鐨勮鍋ュ.寰楀. UTF-8 鏄?ISO C
鏍囧噯璋冪敤澶氬瓧鑺傚瓧絎︿覆鐨勭紪鐮佺殑涓涓緥瀛? wchar_t
綾誨瀷鍙互鐢ㄦ潵瀛樻斁 Unicode 瀛楃.
[鏉ユ簮: http://www.linuxforum.net/books/UTF-8-Unicode.html]
UTF-8 |
|
---|---|
UTF-8N |
|
UTF-16 |
|
UTF-16BE |
|
UTF-16LE |
|
UTF-32 |
|
UTF-32BE |
|
UTF-32LE |
|
Note: The italicized names are not yet registered, but are useful for reference.[from: http://icu.sourceforge.net/docs/papers/forms_of_unicode/]