Encodings

Shijun He hax.sfo at gmail.com
Tue Oct 7 02:01:12 PDT 2008


There are many encodings for chinese.

GB2312, GBK and GB18030 are national standards of mainland China, maps
to the Codepage 936 in MS Windows.
Big5 is the industrial standards in Taiwan and Hongkong.

You can read http://en.wikipedia.org/wiki/Character_set#Popular_character_encodings
for more info.

On Sun, Sep 28, 2008 at 8:35 PM, Sam Ruby <rubys at intertwingly.net> wrote:
> liorean wrote:
>> Hello!
>>
>> Just wondering if anybody has any real world data lying around
>> covering what character encodings are necessary to support real world
>> script content. UTF-8, UTF-16 and ISO-8859-1 are a given guess. What
>> else?
>
> My data relates to feeds, so it may not apply here, but in general
> UTF-16, while used internally in many places, is not widely supported as
> an interchange format.  Here are the encodings that the feed validator
> does *not* mark as obscure:
>
> 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'EUC-JP', 'ISO-8859-2',
> 'ISO-8859-15', 'ISO-8859-7', 'KOI8-R', 'SHIFT_JIS', 'WINDOWS-1250',
> 'WINDOWS-1251', 'WINDOWS-1252', 'WINDOWS-1254', 'WINDOWS-1255',
> 'WINDOWS-1256'
>
> One other deserves special mention: 'GB18030'.  Doesn't seem to be
> popular, but is the Chinese government's mandatory standard.
>
> - Sam Ruby
>
>
> _______________________________________________
> Es-discuss mailing list
> Es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>


More information about the Es-discuss mailing list