国内精品久久久久影院亚洲,久久亚洲熟女cc98cm,麻豆国内精品久久久久久

eSNACC對ASN.1內(nèi)置字符串的編碼和解碼

eSNACC運(yùn)行時庫直接支持各種各樣的ASN.1字符串定義。這包括PrintableString, BMPString, TeletexString, NumericString, IA5String, UniversalString, UTF8String, VisibleString。其做法也大同小異，所有這些字符串都是在eSNACC的字節(jié)串基礎(chǔ)上typedef過來的，例如：

typedef AsnOcts PrintableString; /* [UNIVERSAL 19] IMPLICIT OCTET STRING */

只是根據(jù)各自特性的不同而在編碼解碼時有某些判斷，或者加了某些判斷函數(shù)等。只有UTF8String相對比較復(fù)雜一些。讓我們一起來分析一下吧。

/******************************PrintableString************************************************/

PrintableString也就是可打印字符組成的字符串。因?yàn)橐笫强梢源蛴〉淖址?，其判斷函?shù)在串中全部為可打印字符的函數(shù)時返回0，否則返回-1。

static int chkPrintableString(PrintableString *checkBuf)

{

unsigned int i;

char temp;

if (checkBuf == NULL)

return -1;

for (i = 0; i < checkBuf->octetLen; i++)

{

temp = checkBuf->octs[i];

/* Check A-Z */

if ((temp < 'A') || (temp > 'Z'))

{

/* Check for a-z */

if ((temp < 'a') || (temp > 'z'))

{

/* Check for 0-9 */

if ((temp < '0') || (temp > '9'))

{

switch (temp)

{

case ' ': /* space */

case '\'': /* apostrophe */

case '(': /* left parenthesis */

case ')': /* right parenthesis */

case '+': /* plus sign */

case ',': /* comma */

case '-': /* hyphen */

case '.': /* full stop (period) */

case '/': /* solidus */

case ':': /* colon */

case '=': /* equal sign */

case '?': /* question mark */

break;

default:

return -1;

}

return 0;

} /* end of chkPrintableString() */

然后在編碼和解碼時判斷本函數(shù)的要求是否滿足，否則報(bào)錯。

/******************************************************************************************/

/************************************BMPString**********************************************/

BMPString也就是UNICODE_STRING，也就是雙字節(jié)字符。所以要求其內(nèi)部字節(jié)串的長度必須是2的倍數(shù)，不然編碼和解碼時就會報(bào)錯。

AsnLen BEncBMPStringContent(GenBuf *b, BMPString *octs)

{

if ((octs->octetLen % 2) != 0)

{

BufSetWriteError (b, TRUE);

}

return BEncAsnOctsContent(b, octs);

} /* end of BEncBMPStringContent() */

void BDecBMPStringContent(GenBuf *b, AsnTag tagId, AsnLen len,BMPString *result, AsnLen *bytesDecoded,ENV_TYPE env)

{

BDecAsnOctsContent(b, tagId, len, result, bytesDecoded, env);

if ((result->octetLen % 2) != 0)

{

Asn1Error ("BDecBMPStringContent: ERROR - Invalid BMPString Format");

longjmp (env, -40);

}

/*****************************************************************************************/

/******************************TeletexString*************************************************/

TeletexString就是8位字節(jié)串，所以與AsnOcts是一樣的，只是標(biāo)簽不同，這也就體現(xiàn)在編碼和解碼時對標(biāo)簽的處理不同。

/******************************************************************************************/

/******************************NumericString*************************************************/

NumericString就是要求由數(shù)字組成的串，不過可以有空格。判斷函數(shù)如下：

static int chkNumericString(NumericString *checkBuf)

{

unsigned int i;

if (checkBuf == NULL)

return -1;

for (i = 0; i < checkBuf->octetLen; i++)

{

if ((checkBuf->octs[i] != ' ') &&

((checkBuf->octs[i] < '0') || (checkBuf->octs[i] > '9')))

return -1;

}

return 0;

} /* end of chkNumericString() */

/******************************************************************************************/

/******************************IA5String****************************************************/

在微軟的網(wǎng)站上，對IA5的解釋是：The International Alphabet number 5 (IA5) is generally equivalent to the ASCII alphabet, but different versions can include accents or other characters specific to a regional language. eSNACC給的判斷函數(shù)是：

static int checkIA5String(IA5String *octs)

{

unsigned int i;

if (octs == NULL)

return -1;

for (i = 0; i < octs->octetLen; i++)

{

if ((unsigned char)octs->octs[i] > 0x7F)

return -1;

}

return 0;

}

/******************************************************************************************/

/******************************UniversalString************************************************/

UniversalString要求是用4個字節(jié)表示字符的串。所以在編碼和解碼時要求字節(jié)串的長度為4的整數(shù)倍：

AsnLen BEncUniversalStringContent(GenBuf *b, UniversalString *octs)

{

if ((octs->octetLen % 4) != 0)

{

Asn1Error ("BEncUniversalStringContent: ERROR - Invalid UniversalString Format");

GenBufSetWriteError (b, TRUE);

}

return BEncAsnOctsContent(b, octs);

} /* end of BEncUniversalStringContent() */

void BDecUniversalStringContent(GenBuf *b, AsnTag tagId, AsnLen len,UniversalString *result, AsnLen *bytesDecoded,ENV_TYPE env)

{

BDecAsnOctsContent (b, tagId, len, result, bytesDecoded, env);

if ((result->octetLen % 4) != 0)

{

Asn1Error ("BDecUniversalStringContent: ERROR - Invalid UniversalString Format");

longjmp (env, -40);

}

} /* end of BDecUniversalStringContent() */

/******************************************************************************************/

/******************************VisibleString**************************************************/

VisibleString，具體定義不明確，直接給出eSNACC對串內(nèi)容的判斷函數(shù)吧：

static int chkVisibleString(VisibleString *checkBuf)

{

unsigned int i;

char temp;

if (checkBuf == NULL)

return -1;

for (i = 0; i < checkBuf->octetLen; i++)

{

temp = checkBuf->octs[i];

/* Check A-Z */

if((unsigned int)temp > 128)

{

return -1;

}

return 0;

} /* end of chkVisibleString() */

/******************************************************************************************/

/******************************UTF8String***************************************************/

壓軸好戲在后頭，所以最后讓我們來分析一個最有價值含量的！

UTF-8是UNICODE的一種變長字符編碼又稱萬國碼，由Ken Thompson于1992年創(chuàng)建?，F(xiàn)在已經(jīng)標(biāo)準(zhǔn)化為RFC 3629。UTF-8用1到6個字節(jié)編碼UNICODE字符。eSNACC用字節(jié)串來表示UTF8String，但是有一個判斷這個字節(jié)串是否有效UTF8String的函數(shù)，并且還定義了UTF8String和wchar類型相互轉(zhuǎn)換的函數(shù)，或許從這些函數(shù)中，我們能學(xué)習(xí)eSNACC是怎么處理UTF-8編碼的。

我們先看一些定義和utf8的判斷函數(shù)：

typedef struct

{

unsigned char mask;

unsigned char value;

unsigned long maxCharValue;

} MaskValue;

/* Global Values */

#define MAX_UTF8_OCTS_PER_CHAR 6

const MaskValue gUTF8Masks[MAX_UTF8_OCTS_PER_CHAR] = {

{ 0x80, 0x00, 0x0000007F }, /* one-byte encoding 標(biāo)記為0XXX XXXX*/

{ 0xE0, 0xC0, 0x000007FF }, /* two-byte encoding 標(biāo)記為110X XXXX*/

{ 0xF0, 0xE0, 0x0000FFFF }, /* three-byte encoding 標(biāo)記為1110 XXXX*/

{ 0xF8, 0xF0, 0x0001FFFF }, /* four-byte encoding 標(biāo)記為1111 0XXX*/

{ 0xFC, 0xF8, 0x03FFFFFF }, /* five-byte encoding 標(biāo)記為1111 10XX*/

{ 0xFE, 0xFC, 0x07FFFFFF } /* six-byte encoding 標(biāo)記為1111 110X*/

};

static bool IsValidUTF8String(UTF8String* octs)

{

unsigned long i;

unsigned int j;

if (octs == NULL)

return false;

i = 0;

while (i < octs->octetLen)

{

/* Determine the number of UTF-8 octets that follow the first */

for (j = 0; (j < MAX_UTF8_OCTS_PER_CHAR) &&

((gUTF8Masks[j].mask & octs->octs[i]) != gUTF8Masks[j].value); j++)

;

/* Return false if the first octet was invalid or if the number of

subsequent octets exceeds the UTF8String length */

if ((j == MAX_UTF8_OCTS_PER_CHAR) || ((i + j) >= octs->octetLen))

return false;

/* Skip over first octet */

i++;

/* Check that each subsequent octet is properly formatted */

for (; j > 0; j--)

{

if ((octs->octs[i++] & 0xC0) != 0x80)

return false;

}

return true;

}

首先通過掩碼來判斷第一個字節(jié)，確定當(dāng)前這個字符是用幾個字節(jié)來表示的?？赡艿氖?-6，如果不是就會報(bào)錯。如果是一個，那么當(dāng)前這個字節(jié)就是這個字符了，也就是只要字符小于0x7F，就只需要一個字符：這對應(yīng)了MaskValue中的maxCharValue。其余的情況類似。而函數(shù)末尾那個for說明了：utf8的格式為：如果一個字符用了大于1個字節(jié)來表示，那么除了第一個用于表長度的字節(jié)以外，后面的表值的字節(jié)必須是10XX XXXX的樣式。

那么一個wchar字符是如何用utf8來表示的呢？我們看看wchar -> utf8的函數(shù)：

int CvtWchar2UTF8(wchar_t *inStr, char **utf8Str)

{

size_t wLen;

unsigned int i, j, x, y;

size_t wchar_size = sizeof(wchar_t);

wchar_t temp_wchar;

/* Check parameters */

if ((inStr == NULL) || (utf8Str == NULL))

return -1;

wLen = wcslen(inStr);

/* Allocate and clear memory for a worst case UTF-8 string */

*utf8Str = (char*)calloc(wLen * (wchar_size / 2 * 3) + 1, sizeof(char));

if (*utf8Str == NULL)

return -2;

/* Convert each wide character into a UTF-8 char sequence */

for (i = 0, x = 0; i < wLen; i++)

{

temp_wchar = inStr[i];

/* Return an error if the wide character is invalid */

if (temp_wchar < 0)

{

free(*utf8Str);

*utf8Str = NULL;

return -3;

}

/* Determine the number of characters required to encode this wide

character */

for (j = 0; (j < MAX_UTF8_OCTS_PER_CHAR) &&

(temp_wchar > gUTF8Masks[j].maxCharValue); j++)

;

/* Return an error if the wide character is invalid */

if (j == MAX_UTF8_OCTS_PER_CHAR)

{

free(*utf8Str);

*utf8Str = NULL;

return -3;

}

/* Skip over the first UTF-8 octet and encode the remaining octets

(if any) from right-to-left. Fill in the least significant six bits

of each octet with the low-order bits from the wide character value */

for (y = j; y > 0; y--)

{

(*utf8Str)[x + y] = (char)(0x80 | (temp_wchar & 0x3F));

temp_wchar >>= 6;

}

/* Encode the first UTF-8 octet */

(*utf8Str)[x] = gUTF8Masks[j].value;

(*utf8Str)[x++] |= ~gUTF8Masks[j].mask & temp_wchar;

/* Update the UTF-8 string index (skipping over the subsequent octets

already encoded */

x += j;

}

return 0;

} /* end of CvtWchar2UTF8() */

本函數(shù)第一次為存儲utf8的串分配足夠多的內(nèi)存：“如果UNICODE字符由2個字節(jié)表示，則編碼成UTF-8很可能需要3個字節(jié)，而如果UNICODE字符由4個字節(jié)表示，則編碼成UTF-8可能需要6個字節(jié)。”但是此處總是用3或者6個字節(jié)來存。

然后遍歷每一個wchar字符，獲取存放他所需的字節(jié)數(shù)，然后反序來設(shè)定字節(jié)內(nèi)容。最后設(shè)定長度標(biāo)記字節(jié)。對單字節(jié)表示的情況，其實(shí)這就是原值。

對應(yīng)的utf8 -> wchar 函數(shù)：

int CvtUTF8towchar(char *utf8Str, wchar_t **outStr)

{

unsigned int len, i, j, x;

size_t wchar_size = sizeof(wchar_t);

if ((utf8Str == NULL) || (outStr == NULL))

return -1;

len = strlen(utf8Str);

/* Allocate and clear the memory for a worst case result wchar_t string */

*outStr = (wchar_t*)calloc(len + 1, sizeof(wchar_t));

if (*outStr == NULL)

return -2;

/* Convert the UTF-8 string to a wchar_t string */

i = 0;

x = 0;

while (i < len)

{

/* Determine the number of UTF-8 octets that follow the first */

for (j = 0; (j < MAX_UTF8_OCTS_PER_CHAR) &&

((gUTF8Masks[j].mask & utf8Str[i]) != gUTF8Masks[j].value); j++)

;

/* Return an error if the first octet was invalid or if the number of

subsequent octets exceeds the UTF-8 string length */

if ((j == MAX_UTF8_OCTS_PER_CHAR) || ((i + j) >= len))

{

free(*outStr);

*outStr = NULL;

return -3;

}

/* Return an error if the size of the wchar_t doesn't support the

size of this UTF-8 character */

if ((j > 2) && (wchar_size < 4))

{

free(*outStr);

*outStr = NULL;

return -4;

}

/* Copy the bits from the first octet into the wide character */

(*outStr)[x] = (char)(~gUTF8Masks[j].mask & utf8Str[i++]);

/* Add in the bits from each subsequent octet */

for (; j > 0; j--)

{

/* Return an error if a subsequent octet isn't properly formatted */

if ((utf8Str[i] & 0xC0) != 0x80)

{

free(*outStr);

*outStr = NULL;

return -3;

}

(*outStr)[x] <<= 6;

(*outStr)[x] |= utf8Str[i++] & 0x3F;

}

x++;

}

/* Reallocate the wchar string memory to its correct size */

if (x < len)

{

*outStr = (wchar_t*)realloc(*outStr, (x + 1) * sizeof(wchar_t));

if (*outStr == NULL)

return -2;

}

return 0;

}

以上這兩個函數(shù)都是利用掩碼值做一些按位操作，我一直想著有沒有辦法說明其意義，不過很遺憾，我實(shí)在沒想出能用漢語說明白的辦法。看來還是大家從計(jì)算機(jī)世界的語言去弄明白吧~

/******************************************************************************************/

完。

posted on 2012-04-24 11:41 Tim 閱讀(1291) 評論(1) 編輯收藏引用所屬分類: eSNACC學(xué)習(xí)

只有注冊用戶登錄后才能發(fā)表評論。
【推薦】100%開源！大型工業(yè)跨平臺軟件C++源碼提供，建模，組態(tài)！

相關(guān)文章: eSNACC的C運(yùn)行時庫動態(tài)內(nèi)存管理剖析eSNACC的hash函數(shù) 剖析eSNACC哈希結(jié)構(gòu)的設(shè)計(jì)和實(shí)現(xiàn) eSNACC對ASN.1 constructors的處理 eSNACC對OBJECT IDENTIFIER的編碼和解碼 eSNACC對ASN.1內(nèi)置字符串的編碼和解碼 eSNACC對OCTET STRING 的編碼和解碼 eSNACC對BIT STRING的編碼和解碼 eSNACC對INTEGER的編碼和解碼 eSNACC對BOOLEAN的編碼和解碼

網(wǎng)站導(dǎo)航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理

# re: eSNACC對ASN.1內(nèi)置字符串的編碼和解碼[未登錄] 2012-04-27 09:38 Tina

無我

eSNACC對ASN.1內(nèi)置字符串的編碼和解碼

評論

導(dǎo)航

統(tǒng)計(jì)

公告

留言簿(9)

隨筆分類(173)

IT

Life

搜索

積分與排名

最新隨筆

最新評論

閱讀排行榜