ASC parsing bug?

Lars Hansen lhansen at adobe.com
Mon Jun 16 07:08:07 PDT 2008


The problem is how we can know that we should /not/ be using UTF8 (so that we can choose the default encoding).  Already ASC allows an encoding to be specified explicitly, and UTF8 is the fallback from that case.  (Not clear to me yet which of the clients of the compiler actually pass an encoding and where they obtain it from.)

The only viable strategy I can think of is if we encounter garbage in a file we thought were UTF8 and then back up to the beginning and retry with the default encoding (if different from UTF8).  Probably works.  May not be worth the bother.

--lars 

> -----Original Message-----
> From: Edwin Smith 
> Sent: 16. juni 2008 15:46
> To: Lars Hansen; Michael Daumling; tamarin-devel at mozilla.org
> Subject: RE: ASC parsing bug?
> 
> Maybe the best guess for asc is java's default system 
> encoding in that case?
> 
> > -----Original Message-----
> > From: tamarin-devel-bounces at mozilla.org [mailto:tamarin-devel- 
> > bounces at mozilla.org] On Behalf Of Lars Hansen
> > Sent: Monday, June 16, 2008 9:07 AM
> > To: Michael Daumling; tamarin-devel at mozilla.org
> > Subject: RE: ASC parsing bug?
> > 
> > It appears to be the case that the ASC parser, if presented with an 
> > input file that does not start with a BOM, will assume the file is 
> > UTF8.  If the file is not actually UTF8 encoded but rather ascii with 
> > some extended-ascii characters (as I assume your test case is) then 
> > the parser (probably the Java input buffer layer actually) will 
> > interpret those extended characters according to its own notions.  I'm 
> > not sure whether that should be considered an compiler error or user 
> > error.  After all, there are no data available that allow 
> > the parser to figure out the encoding of the file.
> > 
> > --lars
> > 
> > > -----Original Message-----
> > > From: tamarin-devel-bounces at mozilla.org 
> > > [mailto:tamarin-devel-bounces at mozilla.org] On Behalf Of Michael 
> > > Daumling
> > > Sent: 16. juni 2008 08:41
> > > To: tamarin-devel at mozilla.org
> > > Subject: ASC parsing bug?
> > >
> > > Hi,
> > >
> > > During testing my String implementation, I found that ASC 
> seems to 
> > > parse the string "Sören Lehmenkühler", which seems to be a fine 
> > > German name, badly. Instead of the "ö" and "ü"
> > > characters, the UTF-8 string in the ABC image contains 
> > > REPLACEMENT_CHAR (0xFFFD).
> > >
> > > Michael
> > >
> > > _______________________________________________
> > > Tamarin-devel mailing list
> > > Tamarin-devel at mozilla.org
> > > https://mail.mozilla.org/listinfo/tamarin-devel
> > >
> > _______________________________________________
> > Tamarin-devel mailing list
> > Tamarin-devel at mozilla.org
> > https://mail.mozilla.org/listinfo/tamarin-devel
> 


More information about the Tamarin-devel mailing list