久久aⅴ国产欧美74aaa,欧美日韩成人,国产精品亚洲а∨天堂免在线

Boost:UTF-8 Codecvt Facet(unicode 和 utf-8 之間相互轉碼)

看到有前輩寫了一個UTF-8與UNICODE相互轉換的代碼,順便提一下,希望可以給大家提供一點幫助.
下面是一些編碼格式的bit長

Examples of fixed-width encoding forms:

Type Each character
encoded as Notes

7-bit a single 7-bit quantity example: ISO 646

8-bit G0/G1 a single 8-bit quantity with constraints on use of C0 and C1 spaces

8-bit a single 8-bit quantity with no constraints on use of C1 space

8-bit EBCDIC a single 8-bit quantity with the EBCDIC conventions rather than ASCII conventions

16-bit (UCS-2) a single 16-bit quantity within a code space of 0..FFFF

32-bit (UCS-4) a single 32-bit quantity within a code space 0..7FFFFFFF

32-bit (UTF-32) a single 32-bit quantity within a code space of 0..10FFFF

16-bit DBCS process code a single 16-bit quantity example: UNIX widechar implementations of Asian CCS's

32-bit DBCS process code a single 32-bit quantity example: UNIX widechar implementations of Asian CCS's

DBCS Host two 8-bit quantities following IBM host conventions

Type	Each character encoded as	Notes
7-bit	a single 7-bit quantity	example: ISO 646
8-bit G0/G1	a single 8-bit quantity	with constraints on use of C0 and C1 spaces
8-bit	a single 8-bit quantity	with no constraints on use of C1 space
8-bit EBCDIC	a single 8-bit quantity	with the EBCDIC conventions rather than ASCII conventions
16-bit (UCS-2)	a single 16-bit quantity	within a code space of 0..FFFF
32-bit (UCS-4)	a single 32-bit quantity	within a code space 0..7FFFFFFF
32-bit (UTF-32)	a single 32-bit quantity	within a code space of 0..10FFFF
16-bit DBCS process code	a single 16-bit quantity	example: UNIX widechar implementations of Asian CCS's
32-bit DBCS process code	a single 32-bit quantity	example: UNIX widechar implementations of Asian CCS's
DBCS Host	two 8-bit quantities	following IBM host conventions

Examples of variable-width encoding forms:

Name Characters are encoded as Notes

UTF-8 a mix of one to four 8-bit code units in Unicode
and one to six code units in 10646 used only with Unicode/10646

UTF-16 a mix of one to two 16 bit code units used only with Unicode/10646

Boost中提供了一個UTF-8 Codecvt Facet,可以在utf8和UCS-4(Unicode-32)之間轉換.
使用方式如下

//...
// My encoding type
typedef wchar_t ucs4_t;

std::locale old_locale;
std::locale utf8_locale(old_locale,new utf8_codecvt_facet<ucs4_t>);

// Set a New global locale
std::locale::global(utf8_locale);

// UCS-4 轉換為 UTF-8
{
    std::wofstream ofs("data.ucd");
    ofs.imbue(utf8_locale);
    std::copy(ucs4_data.begin(),ucs4_data.end(),
          std::ostream_iterator<ucs4_t,ucs4_t>(ofs));
}

// 讀入 UTF-8 ,轉換為 UCS-4
std::vector<ucs4_t> from_file;
{
    std::wifstream ifs("data.ucd");
    ifs.imbue(utf8_locale);
    ucs4_t item = 0;
    while (ifs >> item) from_file.push_back(item);
}
//...
UTF-8 Codecvt Facet詳見
http://www.boost.org/libs/serialization/doc/codecvt.html

Name	Characters are encoded as	Notes
UTF-8	a mix of one to four 8-bit code units in Unicode and one to six code units in 10646	used only with Unicode/10646
UTF-16	a mix of one to two 16 bit code units	used only with Unicode/10646

posted on 2006-02-15 17:19 張沈鵬閱讀(2690) 評論(2) 編輯收藏引用

Comments

# re: Boost:UTF-8 Codecvt Facet(unicode 和 utf-8 之間相互轉碼)
無名高手
Posted @ 2006-02-24 14:18
無知少年，看你好學，指點你一下吧

Unicode Technical Report #17 Character Encoding Mode
http://www.unicode.org/unicode/reports/tr17

至于更高層次（更簡單）的要訣，呵呵，不告訴你～～回復更多評論
# re: Boost:UTF-8 Codecvt Facet(unicode 和 utf-8 之間相互轉碼)
張沈鵬
Posted @ 2006-02-25 10:01
至于更高層次（更簡單）的要訣是什么?望高手指教,謝謝回復更多評論

刷新評論列表

只有注冊用戶登錄后才能發(fā)表評論。




網站導航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

導航

常用鏈接

留言簿(3)

隨筆分類(44)

隨筆檔案(65)

相冊

友情Link

最新隨筆

搜索

積分與排名

最新評論

閱讀排行榜