??xml version="1.0" encoding="utf-8" standalone="yes"?>精品久久久久中文字,久久97久久97精品免视看秋霞,久久久精品人妻一区二区三区四http://www.shnenglu.com/kb/category/55.html?是沉睡着的水......zh-cnTue, 20 May 2008 05:11:46 GMTTue, 20 May 2008 05:11:46 GMT60评h一下UTF-8与UNICODE怺转换的代?/title><link>http://www.shnenglu.com/kb/archive/2005/09/29/491.html</link><dc:creator>可冰</dc:creator><author>可冰</author><pubDate>Thu, 29 Sep 2005 12:34:00 GMT</pubDate><guid>http://www.shnenglu.com/kb/archive/2005/09/29/491.html</guid><wfw:comment>http://www.shnenglu.com/kb/comments/491.html</wfw:comment><comments>http://www.shnenglu.com/kb/archive/2005/09/29/491.html#Feedback</comments><slash:comments>8</slash:comments><wfw:commentRss>http://www.shnenglu.com/kb/comments/commentRss/491.html</wfw:commentRss><trackback:ping>http://www.shnenglu.com/kb/services/trackbacks/491.html</trackback:ping><description><![CDATA[<font color="#000000" face="Verdana" size="2">上周,我花了很多心思用模板写了一个UTF-8与UNICODE怺转换的功?见文?/font><a ><font color="#000080" face="Verdana" size="2">code.rar</font></a><font color="#000000" face="Verdana" size="2">),刚开始感觉还可以,但这几天慢慢的觉?Z么不直接提供两个函数?q样不是单方便吗?我这L设计又能带来额外的什么好处呢?刚开始我是想提供比较方便好用以及Ҏ扩展与维护的代码,但现在感觉到与直接提供C式的函数q没有多额外的好处.或许q样的简单功能根本就用不着q样复杂的代码吧.正如Eric Raymond对C++的评价一??使程序员們֐于写复杂的代?.<br>我想大家看看我的代码,l我一Ҏ见和.</font><img src ="http://www.shnenglu.com/kb/aggbug/491.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.shnenglu.com/kb/" target="_blank">可冰</a> 2005-09-29 20:34 <a href="http://www.shnenglu.com/kb/archive/2005/09/29/491.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>构思UTF-8解码模块http://www.shnenglu.com/kb/archive/2005/09/22/399.html可冰可冰Thu, 22 Sep 2005 15:24:00 GMThttp://www.shnenglu.com/kb/archive/2005/09/22/399.htmlhttp://www.shnenglu.com/kb/comments/399.htmlhttp://www.shnenglu.com/kb/archive/2005/09/22/399.html#Feedback1http://www.shnenglu.com/kb/comments/commentRss/399.htmlhttp://www.shnenglu.com/kb/services/trackbacks/399.html 惛_C个解码UTF-8格式文为Unicode格式代码?引擎",要用h方便手.
但想了几天了,都没有一个合适的Ҏ来实?
?.....
今天先试着写了?找找感觉,接着再想?..



可冰 2005-09-22 23:24 发表评论
]]>
std::wfstream是怎么支持宽字W的?http://www.shnenglu.com/kb/archive/2005/09/22/396.html可冰可冰Thu, 22 Sep 2005 14:47:00 GMThttp://www.shnenglu.com/kb/archive/2005/09/22/396.htmlhttp://www.shnenglu.com/kb/comments/396.htmlhttp://www.shnenglu.com/kb/archive/2005/09/22/396.html#Feedback4http://www.shnenglu.com/kb/comments/commentRss/396.htmlhttp://www.shnenglu.com/kb/services/trackbacks/396.html
std::wfstream的定义ؓ:
typedef basic_fstream<wchar_t, char_traits<wchar_t> > wfstream;
在读取字W时:
wfstream wfile( "wcharfile.txt" );
wchar_t wch = wfile.get();
按语义讲应该是读入两个字节内容的.但经输出?它却只读入一个字?q样和fstreamq有什么分?
到底在处理Unicode~码的文件时,应该如何使用宽字W流?


可冰 2005-09-22 22:47 发表评论
]]>
"q是一个UTF-8格式的文?"的几U不同编码表C?/title><link>http://www.shnenglu.com/kb/archive/2005/09/20/343.html</link><dc:creator>可冰</dc:creator><author>可冰</author><pubDate>Tue, 20 Sep 2005 12:39:00 GMT</pubDate><guid>http://www.shnenglu.com/kb/archive/2005/09/20/343.html</guid><wfw:comment>http://www.shnenglu.com/kb/comments/343.html</wfw:comment><comments>http://www.shnenglu.com/kb/archive/2005/09/20/343.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.shnenglu.com/kb/comments/commentRss/343.html</wfw:commentRss><trackback:ping>http://www.shnenglu.com/kb/services/trackbacks/343.html</trackback:ping><description><![CDATA[<p class="box"><img src="http://www.shnenglu.com/images/cppblog_com/kb/58/r_charcode.gif"> </p><img src ="http://www.shnenglu.com/kb/aggbug/343.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.shnenglu.com/kb/" target="_blank">可冰</a> 2005-09-20 20:39 <a href="http://www.shnenglu.com/kb/archive/2005/09/20/343.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>UTF-8 ~码格式ȝhttp://www.shnenglu.com/kb/archive/2005/09/19/320.html可冰可冰Mon, 19 Sep 2005 12:03:00 GMThttp://www.shnenglu.com/kb/archive/2005/09/19/320.htmlhttp://www.shnenglu.com/kb/comments/320.htmlhttp://www.shnenglu.com/kb/archive/2005/09/19/320.html#Feedback3http://www.shnenglu.com/kb/comments/commentRss/320.htmlhttp://www.shnenglu.com/kb/services/trackbacks/320.html[以下只是个h的ȝ,如若有误,恌指正,谢谢!]
下列字节串用来表CZ个字W? 用到哪个串取决于该字W在 Unicode 中的序号.
U+00000000 - U+0000007F: 0 xxxxxxx 0x - 7x  
U+00000080 - U+000007FF: 110 xxxxx 10 xxxxxx Cx 8x - Dx Bx  
U+00000800 - U+0000FFFF: 1110 xxxx 10 xxxxxx 10 xxxxxx Ex 8x 8x - Ex Bx Bx  
U+00010000 - U+001FFFFF: 11110 xxx 10 xxxxxx 10 xxxxxx 10 xxxxxx F0 8x 8x 8x - F7 Bx Bx Bx 很少?/td>
U+00200000 - U+03FFFFFF: 111110 xx 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx F8 8x 8x 8x 8x - FB Bx Bx Bx Bx
U+04000000 - U+7FFFFFFF: 1111110 x 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx FC 8x 8x 8x 8x 8x - FD Bx Bx Bx Bx Bx


* FE FF从未在编码中出现q?
* 除第一个字节外,其余字节都在 0x80 ?0xBF范围?每个字符的v始位|用0xC0-0xD0,0xE0,0xF0{可以确?验证前四位或八位),不在q一范围的即为单字节字符.凡是?span style="color: rgb(153, 0, 0); font-weight: bold;">0x80 ?0xBF开头的都是后字节,计数旉要蟩q?
* Unicode是一U编码表,只将字符指定l某一数字(Unicode做得q要更多一?比如提供比较及显C等很多法{等);
而UTF-8是编码方?是定义如何表Cƈ存储指定~码的格?
* UTF-8~码转换为Unicode~码: 所有标志位去除,剩余位数若不_在高位补?凑32位即?
* Unicode~码转换为UTF-8~码: 从低位开?每取6位补两个?0,不6?不算高位?)则按字节长度补相应的字符标志??10?110{?/font>



可冰 2005-09-19 20:03 发表评论
]]>
UTF typeshttp://www.shnenglu.com/kb/archive/2005/09/19/312.html可冰可冰Mon, 19 Sep 2005 07:38:00 GMThttp://www.shnenglu.com/kb/archive/2005/09/19/312.htmlhttp://www.shnenglu.com/kb/comments/312.htmlhttp://www.shnenglu.com/kb/archive/2005/09/19/312.html#Feedback0http://www.shnenglu.com/kb/comments/commentRss/312.htmlhttp://www.shnenglu.com/kb/services/trackbacks/312.html UTF Formats Estimated average storage required per page (3000 characters) UTF-8




3 KB
(1999)
5 KB
(2003) On average, English takes slightly over one unit per code point. Most Latin-script languages take about 1.1 bytes. Greek, Russian, Arabic and Hebrew take about 1.7 bytes, and most others (including Japanese, Chinese, Korean and Hindi) take about 3 bytes. Characters in surrogate space take 4 bytes, but as a proportion of all world text they will always be very rare. UTF-16


6 KB All of the most common characters in use for all modern writing systems are already represented with 2 bytes. Characters in surrogate space take 4 bytes, but as a proportion of all world text they will always be very rare. UTF-32

12 KB All take 4 bytes

[来源: http://icu.sourceforge.net/docs/papers/forms_of_unicode/]


UTF-8(ISO 10646-1) 有以下特?

  • UCS 字符 U+0000 ?U+007F (ASCII) 被编码ؓ字节 0x00 ?0x7F (ASCII 兼容). q意味着只包?7 ?ASCII 字符的文件在 ASCII ?UTF-8 两种~码方式下是一L.
  • 所?span style="color: red;"> > U+007F ?UCS 字符被编码ؓ一个或多个字节的串, 每个字节都有标记位集. 因此, ASCII 字节 (0x00-0x7F) 不可能作ZQ何其他字W的一部分.
  • 表示?ASCII 字符的多字节串的W一个字?/span>L?0xC0 ?0xFD 的范围里, q指个字W包含多个字节. 多字节串?span style="color: red;">其余字节都在 0x80 ?0xBF 范围? q得重新同步非常容? qɾ~码无国? 且很受丢失字节的媄?
  • 可以~入所有可能的 231?UCS 代码
  • UTF-8 ~码字符理论上可以最多到 6 个字节长, 然?16 ?BMP 字符最多只用到 3 字节?
  • Bigendian UCS-4 字节串的排列序是预定的.
  • 字节 0xFE ?0xFF ?UTF-8 ~码中从未用?

下列字节串用来表CZ个字W? 用到哪个串取决于该字W在 Unicode 中的序号.

U-00000000 - U-0000007F: 0xxxxxxx
U-00000080 - U-000007FF: 110xxxxx 10xxxxxx
U-00000800 - U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx
U-00010000 - U-001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
U-00200000 - U-03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
U-04000000 - U-7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

xxx 的位|由字符~码数的二进制表C的位填? 靠右的 x h少的特D意? 只用最短的那个_表达一个字W编码数的多字节? 注意在多字节串中, W一个字节的开?1"的数目就是整个串中字节的数目.

例如: Unicode 字符 U+00A9 = 1010 1001 (版权W号) ?UTF-8 里的~码?

11000010 10101001 = 0xC2 0xA9

而字W?U+2260 = 0010 0010 0110 0000 (不等? ~码?

11100010 10001001 10100000 = 0xE2 0x89 0xA0

q种~码的官方名字拼写ؓ UTF-8, 其中 UTF 代表 UCS Transformation Format. 请勿在Q何文中用其他名?(比如 utf8 ?UTF_8) 来表C?UTF-8, 当然除非你指的是一个变量名而不是这U编码本w?

什么编E语a支持 Unicode?

在大U?1993 q之后开发的大多数现代编E语a都有一个特别的数据cd, 叫做 Unicode/ISO 10646-1 字符. ?Ada95 中叫 Wide_Character, ?Java 中叫 char.

ISO C 也详l说明了处理多字节编码和宽字W?(wide characters) 的机? 1994 q?9 ?Amendment 1 to ISO C 发表时又加入了更? q些机制主要是ؓ各类东亚~码而设计的, 它们比处?UCS 所需的要健壮得多. UTF-8 ?ISO C 标准调用多字节字W串的编码的一个例? wchar_t cd可以用来存放 Unicode 字符.
[来源: http://www.linuxforum.net/books/UTF-8-Unicode.html]



可冰 2005-09-19 15:38 发表评论
]]>
UTF serializationshttp://www.shnenglu.com/kb/archive/2005/09/19/310.html可冰可冰Mon, 19 Sep 2005 07:23:00 GMThttp://www.shnenglu.com/kb/archive/2005/09/19/310.htmlhttp://www.shnenglu.com/kb/comments/310.htmlhttp://www.shnenglu.com/kb/archive/2005/09/19/310.html#Feedback0http://www.shnenglu.com/kb/comments/commentRss/310.htmlhttp://www.shnenglu.com/kb/services/trackbacks/310.html
UTF-8
  • Inital EF BB BF is a signature, indicating that the rest of the file is UTF-8.
  • Any EF BF BE is an error.
  • A real ZWNBSP at the start of a file requires a signature first.
UTF-8N
  • All of the text is normal UTF-8; there is no signature.
  • Inital EF BB BF is a ZWNBSP.
  • Any EF BF BE is an error.
UTF-16
  • Initial FE FF is a signature indicating the rest of the text is big endian UTF-16.
  • Initial FF FE is a signature indicating the rest of the text is little endian UTF-16.
  • If neither of these are present, all of the text is big endian.
  • A real ZWNBSP at the start of a file requires a signature first.
UTF-16BE
  • All of the text is big endian: there is no signature.
  • Initial FE FF is a ZWNBSP.
  • Any FF FE is an error.
UTF-16LE
  • All of the text is little endian: there is no signature.
  • Initial FF FE is a ZWNBSP.
  • Any FE FF is an error.
UTF-32
  • Initial 00 00 FE FF is a signature indicating the rest of the text is big endian UTF-32.
  • Initial FF FE 00 00 is a signature indicating the rest of the text is little endian UTF-32.
  • If neither of these are present, all of the text is big endian.
  • A real ZWNBSP at the start of a file requires a signature first.
UTF-32BE
  • All of the text is big endian: there is no signature.
  • Initial 00 00 FE FF is a ZWNBSP.
  • Any FF FE 00 00 is an error.
UTF-32LE
  • All of the text is little endian: there is no signature.
  • Initial FF FE 00 00 is a ZWNBSP.
  • Initial 00 00 FE FF is an error.
Note: The italicized names are not yet registered, but are useful for reference.
[from: http://icu.sourceforge.net/docs/papers/forms_of_unicode/]


可冰 2005-09-19 15:23 发表评论
]]>
պĻþ| ˬݾþþۺ鶹 | 99þþƷһ| 2021ƷۺϾþ| þøݾƷԴվ| wwþþþþþþþ| þþù޾Ʒ| þþþþþۺձ | þþþþþ91Ʒѹۿ| þۺϺݺۺϾþۺ88| һƷþ| þþþùƷ| þù| 99ŷƷþþѿ| þˬ˾ƷƵ| һaƬþëƬ| þþƷAV| ۲ӰԺþþƷ| þþþƷ鶹| ŷսþþþþþ| Ʒ99þþþ| ž99Ʒþþþþ| ޷AVþò| þAVij| þþþӰԺС| þþƷһ| þþþþaëƬ| ޹Ʒ18þþþþ| ݺɫþþۺƵպ | 㽶þҹɫƷС˵| ˾þav| þþþۺϹŷһ| ݺɫۺϾþ | ޺ݺۺϾþþþ| ɫۺϾþ| ɫƾþþþþþۺ| ŷ˾þۺ| þþƷһ| ˾Ʒþ| þþþþ޾Ʒ| þþþӰԺ|