Encodings

Sam Ruby rubys at intertwingly.net
Sun Sep 28 05:35:37 PDT 2008


liorean wrote:
> Hello!
> 
> Just wondering if anybody has any real world data lying around
> covering what character encodings are necessary to support real world
> script content. UTF-8, UTF-16 and ISO-8859-1 are a given guess. What
> else?

My data relates to feeds, so it may not apply here, but in general 
UTF-16, while used internally in many places, is not widely supported as 
an interchange format.  Here are the encodings that the feed validator 
does *not* mark as obscure:

'US-ASCII', 'ISO-8859-1', 'UTF-8', 'EUC-JP', 'ISO-8859-2', 
'ISO-8859-15', 'ISO-8859-7', 'KOI8-R', 'SHIFT_JIS', 'WINDOWS-1250', 
'WINDOWS-1251', 'WINDOWS-1252', 'WINDOWS-1254', 'WINDOWS-1255', 
'WINDOWS-1256'

One other deserves special mention: 'GB18030'.  Doesn't seem to be 
popular, but is the Chinese government's mandatory standard.

- Sam Ruby




More information about the Es-discuss mailing list