ASC parsing bug?

Lars Hansen lhansen at adobe.com
Mon Jun 16 06:06:31 PDT 2008


It appears to be the case that the ASC parser, if presented with an input file that does not start with a BOM, will assume the file is UTF8.  If the file is not actually UTF8 encoded but rather ascii with some extended-ascii characters (as I assume your test case is) then the parser (probably the Java input buffer layer actually) will interpret those extended characters according to its own notions.  I'm not sure whether that should be considered an compiler error or user error.  After all, there are no data available that allow the parser to figure out the encoding of the file.

--lars

> -----Original Message-----
> From: tamarin-devel-bounces at mozilla.org 
> [mailto:tamarin-devel-bounces at mozilla.org] On Behalf Of 
> Michael Daumling
> Sent: 16. juni 2008 08:41
> To: tamarin-devel at mozilla.org
> Subject: ASC parsing bug?
> 
> Hi,
> 
> During testing my String implementation, I found that ASC 
> seems to parse the string "Sören Lehmenkühler", which seems 
> to be a fine German name, badly. Instead of the "ö" and "ü" 
> characters, the UTF-8 string in the ABC image contains 
> REPLACEMENT_CHAR (0xFFFD).
> 
> Michael
>  
> _______________________________________________
> Tamarin-devel mailing list
> Tamarin-devel at mozilla.org
> https://mail.mozilla.org/listinfo/tamarin-devel
> 


More information about the Tamarin-devel mailing list