??xml version="1.0" encoding="utf-8" standalone="yes"?>欧美欧美午夜aⅴ在线观看,国产人成一区二区三区影院,蜜桃av一区http://www.shnenglu.com/kb/category/55.html?是沉睡着的水......zh-cnTue, 20 May 2008 05:11:46 GMTTue, 20 May 2008 05:11:46 GMT60评h(hun)一下UTF-8与UNICODE怺转换的代?/title><link>http://www.shnenglu.com/kb/archive/2005/09/29/491.html</link><dc:creator>可冰</dc:creator><author>可冰</author><pubDate>Thu, 29 Sep 2005 12:34:00 GMT</pubDate><guid>http://www.shnenglu.com/kb/archive/2005/09/29/491.html</guid><wfw:comment>http://www.shnenglu.com/kb/comments/491.html</wfw:comment><comments>http://www.shnenglu.com/kb/archive/2005/09/29/491.html#Feedback</comments><slash:comments>8</slash:comments><wfw:commentRss>http://www.shnenglu.com/kb/comments/commentRss/491.html</wfw:commentRss><trackback:ping>http://www.shnenglu.com/kb/services/trackbacks/491.html</trackback:ping><description><![CDATA[<font color="#000000" face="Verdana" size="2">上周,我花?jin)很多?j)思用模板写?jin)一个UTF-8与UNICODE怺转换的功?见文?/font><a ><font color="#000080" face="Verdana" size="2">code.rar</font></a><font color="#000000" face="Verdana" size="2">),刚开始感觉还可以,但这几天慢慢的觉?Z么不直接提供两个函数?q样不是单方便吗?我这L(fng)设计又能带来额外的什么好处呢?刚开始我是想提供比较方便好用以及(qing)Ҏ(gu)扩展与维护的代码,但现在感觉到与直接提供C式的函数q没有多额外的好处.或许q样的简单功能根本就用不着q样复杂的代码吧.正如Eric Raymond对C++的评价一??使程序员們֐于写复杂的代?.<br>我想大家看看我的代码,l我一Ҏ(gu)见和.</font><img src ="http://www.shnenglu.com/kb/aggbug/491.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.shnenglu.com/kb/" target="_blank">可冰</a> 2005-09-29 20:34 <a href="http://www.shnenglu.com/kb/archive/2005/09/29/491.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>构思UTF-8解码模块http://www.shnenglu.com/kb/archive/2005/09/22/399.html可冰可冰Thu, 22 Sep 2005 15:24:00 GMThttp://www.shnenglu.com/kb/archive/2005/09/22/399.htmlhttp://www.shnenglu.com/kb/comments/399.htmlhttp://www.shnenglu.com/kb/archive/2005/09/22/399.html#Feedback1http://www.shnenglu.com/kb/comments/commentRss/399.htmlhttp://www.shnenglu.com/kb/services/trackbacks/399.html 惛_C个解码UTF-8格式文为Unicode格式代码?引擎",要用h方便手.
但想?jin)几天?jin),都没有一个合适的Ҏ(gu)来实?
?.....
今天先试着写了(jin)?找找感觉,接着再想?..



可冰 2005-09-22 23:24 发表评论
]]>
std::wfstream是怎么支持宽字W的?http://www.shnenglu.com/kb/archive/2005/09/22/396.html可冰可冰Thu, 22 Sep 2005 14:47:00 GMThttp://www.shnenglu.com/kb/archive/2005/09/22/396.htmlhttp://www.shnenglu.com/kb/comments/396.htmlhttp://www.shnenglu.com/kb/archive/2005/09/22/396.html#Feedback4http://www.shnenglu.com/kb/comments/commentRss/396.htmlhttp://www.shnenglu.com/kb/services/trackbacks/396.html
std::wfstream的定义ؓ(f):
typedef basic_fstream<wchar_t, char_traits<wchar_t> > wfstream;
在读取字W时:
wfstream wfile( "wcharfile.txt" );
wchar_t wch = wfile.get();
按语义讲应该是读入两个字节内容的.但经输出(g)?它却只读入一个字?q样和fstreamq有什么分?
到底在处理Unicode~码的文件时,应该如何使用宽字W流?


可冰 2005-09-22 22:47 发表评论
]]>
"q是一个UTF-8格式的文?"的几U不同编码表C?/title><link>http://www.shnenglu.com/kb/archive/2005/09/20/343.html</link><dc:creator>可冰</dc:creator><author>可冰</author><pubDate>Tue, 20 Sep 2005 12:39:00 GMT</pubDate><guid>http://www.shnenglu.com/kb/archive/2005/09/20/343.html</guid><wfw:comment>http://www.shnenglu.com/kb/comments/343.html</wfw:comment><comments>http://www.shnenglu.com/kb/archive/2005/09/20/343.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.shnenglu.com/kb/comments/commentRss/343.html</wfw:commentRss><trackback:ping>http://www.shnenglu.com/kb/services/trackbacks/343.html</trackback:ping><description><![CDATA[<p class="box"><img src="http://www.shnenglu.com/images/cppblog_com/kb/58/r_charcode.gif"> </p><img src ="http://www.shnenglu.com/kb/aggbug/343.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.shnenglu.com/kb/" target="_blank">可冰</a> 2005-09-20 20:39 <a href="http://www.shnenglu.com/kb/archive/2005/09/20/343.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>UTF-8 ~码格式ȝhttp://www.shnenglu.com/kb/archive/2005/09/19/320.html可冰可冰Mon, 19 Sep 2005 12:03:00 GMThttp://www.shnenglu.com/kb/archive/2005/09/19/320.htmlhttp://www.shnenglu.com/kb/comments/320.htmlhttp://www.shnenglu.com/kb/archive/2005/09/19/320.html#Feedback3http://www.shnenglu.com/kb/comments/commentRss/320.htmlhttp://www.shnenglu.com/kb/services/trackbacks/320.html[以下只是个h的ȝ,如若有误,恌指正,谢谢!]
下列字节串用来表CZ个字W? 用到哪个串取决于该字W在 Unicode 中的序号.
U+00000000 - U+0000007F: 0 xxxxxxx 0x - 7x  
U+00000080 - U+000007FF: 110 xxxxx 10 xxxxxx Cx 8x - Dx Bx  
U+00000800 - U+0000FFFF: 1110 xxxx 10 xxxxxx 10 xxxxxx Ex 8x 8x - Ex Bx Bx  
U+00010000 - U+001FFFFF: 11110 xxx 10 xxxxxx 10 xxxxxx 10 xxxxxx F0 8x 8x 8x - F7 Bx Bx Bx 很少?/td>
U+00200000 - U+03FFFFFF: 111110 xx 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx F8 8x 8x 8x 8x - FB Bx Bx Bx Bx
U+04000000 - U+7FFFFFFF: 1111110 x 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx FC 8x 8x 8x 8x 8x - FD Bx Bx Bx Bx Bx


* FE FF从未在编码中出现q?
* 除第一个字节外,其余字节都在 0x80 ?0xBF范围?每个字符的v始位|用0xC0-0xD0,0xE0,0xF0{可以确?验证前四位或八位),不在q一范围的即为单字节字符.凡是?span style="color: rgb(153, 0, 0); font-weight: bold;">0x80 ?0xBF开头的都是后字节,计数旉要蟩q?
* Unicode是一U编码表,只将字符指定l某一数字(Unicode做得q要更多一?比如提供比较?qing)显C等很多法{等);
而UTF-8是编码方?是定义如何表Cƈ存储指定~码的格?
* UTF-8~码转换为Unicode~码: 所有标志位去除,剩余位数若不_在高?sh)补?凑32位即?
* Unicode~码转换为UTF-8~码: 从低位开?每取6位补两个?0,不6?不算高(sh)?)则按字节长度补相应的字符标志??10?110{?/font>



可冰 2005-09-19 20:03 发表评论
]]>
UTF typeshttp://www.shnenglu.com/kb/archive/2005/09/19/312.html可冰可冰Mon, 19 Sep 2005 07:38:00 GMThttp://www.shnenglu.com/kb/archive/2005/09/19/312.htmlhttp://www.shnenglu.com/kb/comments/312.htmlhttp://www.shnenglu.com/kb/archive/2005/09/19/312.html#Feedback0http://www.shnenglu.com/kb/comments/commentRss/312.htmlhttp://www.shnenglu.com/kb/services/trackbacks/312.html UTF Formats Estimated average storage required per page (3000 characters) UTF-8




3 KB
(1999)
5 KB
(2003) On average, English takes slightly over one unit per code point. Most Latin-script languages take about 1.1 bytes. Greek, Russian, Arabic and Hebrew take about 1.7 bytes, and most others (including Japanese, Chinese, Korean and Hindi) take about 3 bytes. Characters in surrogate space take 4 bytes, but as a proportion of all world text they will always be very rare. UTF-16


6 KB All of the most common characters in use for all modern writing systems are already represented with 2 bytes. Characters in surrogate space take 4 bytes, but as a proportion of all world text they will always be very rare. UTF-32

12 KB All take 4 bytes

[来源: http://icu.sourceforge.net/docs/papers/forms_of_unicode/]


UTF-8(ISO 10646-1) 有以下特?

  • UCS 字符 U+0000 ?U+007F (ASCII) 被编码ؓ(f)字节 0x00 ?0x7F (ASCII 兼容). q意味着只包?7 ?ASCII 字符的文件在 ASCII ?UTF-8 两种~码方式下是一L(fng).
  • 所?span style="color: red;"> > U+007F ?UCS 字符被编码ؓ(f)一个或多个字节的串, 每个字节都有标记位集. 因此, ASCII 字节 (0x00-0x7F) 不可能作ZQ何其他字W的一部分.
  • 表示?ASCII 字符的多字节串的W一个字?/span>L?0xC0 ?0xFD 的范围里, q指?gu)个字W包含多个字节. 多字节串?span style="color: red;">其余字节都在 0x80 ?0xBF 范围? q得重新同步非常容? qɾ~码无国? 且很受丢失字节的媄(jing)?
  • 可以~入所有可能的 231?UCS 代码
  • UTF-8 ~码字符理论上可以最多到 6 个字节长, 然?16 ?BMP 字符最多只用到 3 字节?
  • Bigendian UCS-4 字节串的排列序是预定的.
  • 字节 0xFE ?0xFF ?UTF-8 ~码中从未用?

下列字节串用来表CZ个字W? 用到哪个串取决于该字W在 Unicode 中的序号.

U-00000000 - U-0000007F: 0xxxxxxx
U-00000080 - U-000007FF: 110xxxxx 10xxxxxx
U-00000800 - U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx
U-00010000 - U-001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
U-00200000 - U-03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
U-04000000 - U-7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

xxx 的位|由字符~码数的二进制表C的位填? 靠右的 x h少的特D意? 只用最短的那个_表达一个字W编码数的多字节? 注意在多字节串中, W一个字节的开?1"的数目就是整个串中字节的数目.

例如: Unicode 字符 U+00A9 = 1010 1001 (版权W号) ?UTF-8 里的~码?

11000010 10101001 = 0xC2 0xA9

而字W?U+2260 = 0010 0010 0110 0000 (不等? ~码?

11100010 10001001 10100000 = 0xE2 0x89 0xA0

q种~码的官方名字拼写ؓ(f) UTF-8, 其中 UTF 代表 UCS Transformation Format. 请勿在Q何文中用其他名?(比如 utf8 ?UTF_8) 来表C?UTF-8, 当然除非你指的是一个变量名而不是这U编码本w?

什么编E语a支持 Unicode?

在大U?1993 q之后开发的大多数现代编E语a都有一个特别的数据cd, 叫做 Unicode/ISO 10646-1 字符. ?Ada95 中叫 Wide_Character, ?Java 中叫 char.

ISO C 也详l说明了(jin)处理多字节编码和宽字W?(wide characters) 的机? 1994 q?9 ?Amendment 1 to ISO C 发表时又加入?jin)更? q些机制主要是ؓ(f)各类东亚~码而设计的, 它们比处?UCS 所需的要健壮得多. UTF-8 ?ISO C 标准调用多字节字W串的编码的一个例? wchar_t cd可以用来存放 Unicode 字符.
[来源: http://www.linuxforum.net/books/UTF-8-Unicode.html]



可冰 2005-09-19 15:38 发表评论
]]>
UTF serializationshttp://www.shnenglu.com/kb/archive/2005/09/19/310.html可冰可冰Mon, 19 Sep 2005 07:23:00 GMThttp://www.shnenglu.com/kb/archive/2005/09/19/310.htmlhttp://www.shnenglu.com/kb/comments/310.htmlhttp://www.shnenglu.com/kb/archive/2005/09/19/310.html#Feedback0http://www.shnenglu.com/kb/comments/commentRss/310.htmlhttp://www.shnenglu.com/kb/services/trackbacks/310.html
UTF-8
  • Inital EF BB BF is a signature, indicating that the rest of the file is UTF-8.
  • Any EF BF BE is an error.
  • A real ZWNBSP at the start of a file requires a signature first.
UTF-8N
  • All of the text is normal UTF-8; there is no signature.
  • Inital EF BB BF is a ZWNBSP.
  • Any EF BF BE is an error.
UTF-16
  • Initial FE FF is a signature indicating the rest of the text is big endian UTF-16.
  • Initial FF FE is a signature indicating the rest of the text is little endian UTF-16.
  • If neither of these are present, all of the text is big endian.
  • A real ZWNBSP at the start of a file requires a signature first.
UTF-16BE
  • All of the text is big endian: there is no signature.
  • Initial FE FF is a ZWNBSP.
  • Any FF FE is an error.
UTF-16LE
  • All of the text is little endian: there is no signature.
  • Initial FF FE is a ZWNBSP.
  • Any FE FF is an error.
UTF-32
  • Initial 00 00 FE FF is a signature indicating the rest of the text is big endian UTF-32.
  • Initial FF FE 00 00 is a signature indicating the rest of the text is little endian UTF-32.
  • If neither of these are present, all of the text is big endian.
  • A real ZWNBSP at the start of a file requires a signature first.
UTF-32BE
  • All of the text is big endian: there is no signature.
  • Initial 00 00 FE FF is a ZWNBSP.
  • Any FF FE 00 00 is an error.
UTF-32LE
  • All of the text is little endian: there is no signature.
  • Initial FF FE 00 00 is a ZWNBSP.
  • Initial 00 00 FE FF is an error.
Note: The italicized names are not yet registered, but are useful for reference.
[from: http://icu.sourceforge.net/docs/papers/forms_of_unicode/]


可冰 2005-09-19 15:23 发表评论
]]>
青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品
  • <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            ŷɫۺ| ޼һ| þþþþþþþþ| ŷ޾Ʒһ| ŷ߹Ʒ| ۺ| ۺ999| ŷҳ| պƵ߲| þþþŷƷ| ߲| һ91ƷŽ| ŷպһ| ޻ɫѵӰ| ŷ| վ| ŷպ| ޸Ƶһ| 777ŷ| ŷ| 99ƷȾþ| | Ʒһ| һ| ޹Ʒպ| Ʒavһ| þۺϾɫž| þþҹƷ| 㽶ۺ8| þƵ߹ۿ| ŷƷһ߹ۿ| 99ŷþþþƷ| ŷһƵ| һ| þþþþþþþþþþŮ| ŷƷ| ߹ۿ| һٸ| Ƶ| ҹƵ߹ۿ| ޾Ʒþ99| պһ| Ƶۺ| ëƬ| պŷ91| ҹavӰͬ| ŷ| þþþƷһƷһ| þ| һ| ӰԺ| 91| ѹۿav| | | ߹ۿavӰ| ŷӰһ| þþƷѿþþƷ| ŷҹ| ҹƷŮþþþav| ŷƵ| þôƵ| һƵۿ| ŷһƵ| þþþþۺӰԺ| ޼Ƶ| Ʒav| þþþ9999޾Ʒ| ۺϾþþþþùɫ| ŷ߿Ƭ| ŷ˾Ʒ߹ۿ| ŷպƷһ| 99re6ֻоƷ| Ļպ| Ʒҹҹҹһ| һ| þ| ŷձ| ŷպۺƵ߹ۿ| ޾Ʒһ߶| һպ| ޹һ| ɫۺվ| Ůͬһ| ëƬavĻһ| þþƷ99þ㽶| ŷƵһ| Ĺ˾Ʒ| ŷ| ѻɫ| þպһ| ߾Ʒҹ| ŷպ| һѹۿ| պһŷ| Ʒרh߹ۿ| þþƷˬӰ| ޹Ʒ123| ߴƷwwwһ| ŷպ| 9òƵƵƷ| þùƷ72ѹۿ| ޼Ӱ| ŷƷӰ| ޾ƷƵ| ȹƷ| ŷպһ߹ۿ| һƵ߹ۿ| ŷ aۺ| ޿Ƭվ| avþþ޾Ʒ| ŷպ˾ƷӰԺ| ߿պav| պ߲| ŷպ| ŷ| ѹһŷƵ| ŷԺ| һ ߹ۿ| ŷƵ| 鶹Ʒվ| þþ| þþƷ| ŷһվ| լ߹ۿ޲| ŷղ| ƵƵþ| Ʒպŷ| ŷպ| ŷƷ͵| ŷպƷ| ŷ12| ŷaƵ| ѳƵվ| þҹӰԺ| ޹Ů| ŷձ| һëƬ| ֻоƷ˿| Ļ| ۺϹ| ŷһ| ŷһպ| ŷһպ| þþƷĻһ| þþƷѹۿ| þþþ | ŷ| ŷƵѹۿ| ŷƷɫ| Ʒþվ| ŷձۺ| ŷ߿ҰŷƷ| ŷaѵӰ| ŷƵ| Ʒþþþþþһ̽| ŷƵƷ| ŷպƵ| Ըٸһ| Ůavһ| ŷjizzhdƷŷ޴| ŷٸ| պƵ| һ߹ۿƵ| ޹Ƶ߹ۿ | ŷ˹һ| ޸Ƶѹۿ| ڲŷһ| ŷ޸Ⱥ| ޹㽶þþþþ99| 99ֻоþþƷƵ| ŷwww| ѹۿwwwƵ| ŷպһۺ| ɫ߹ۿ| һ| þþþþ㽶| ޾Ʒþ| һҳ| ŷ| ŷӰȷ| ޾Ʒþþ| ޾ƷŮavվ| þþžŵӰ| 99ƵƷѹۿ| þùŷƷ| þ۲ݾƷþþþƷһ| ŷպҹ糡| ޵ӰĻ| þþƷŷˬ| ŷۺ| þþŷƷsmվ| Ʒާv| ޾ƷƷþ99| þþƷѲ| ҹƷþ| ŷպ岻| ޾ƷƵ| þ91| ҹƵ߹ۿһ|