ASC parsing bug?

Jason Orendorff jorendorff at mozilla.com
Thu Jun 19 08:54:44 PDT 2008


On Jun 17, 2008, at 8:17 AM, Thomas Reilly wrote:
> I don't know I kinda like the Java behavior of using the system  
> encoding.  The OSes typically know there locale seems silly not to  
> use it.   That's what will be the most convienent for folks and  
> that's probably how most other tools work, no?

I used to think this but changed my mind.  Those OS-level settings are  
pretty far removed from the actual encoding of any given file on  
disk.  What determines the encoding is where the file came from, and  
that could be anywhere.  The OS setting is unenforced and very often  
wrong.

On Windows, tools often don't respect the system encoding.  Even  
Microsoft's own libraries don't.  The simplest .NET API for writing a  
text files defaults to UTF-8 (without a BOM):
   http://msdn.microsoft.com/en-us/library/fysy0a4b.aspx

UTF-8 is a good default encoding.  It's ASCII-compatible.  It's the  
same everywhere, so it doesn't raise barriers to sharing code over the  
net.  And if the input isn't UTF-8, you'll often actually detect the  
mistake, instead of silently getting the wrong characters.

-j



More information about the Tamarin-devel mailing list