ASC parsing bug?

Steven Johnson stejohns at adobe.com
Mon Jun 16 10:18:05 PDT 2008


Or, if we encounter a non-ASCII sequence, and there's no BOM and no explicit
encoding specified, simply fail compilation with an explicit error.
Draconian but effective.


On 6/16/08 7:08 AM, "Lars Hansen" <lhansen at adobe.com> wrote:

> The problem is how we can know that we should /not/ be using UTF8 (so that we
> can choose the default encoding).  Already ASC allows an encoding to be
> specified explicitly, and UTF8 is the fallback from that case.  (Not clear to
> me yet which of the clients of the compiler actually pass an encoding and
> where they obtain it from.)
> 
> The only viable strategy I can think of is if we encounter garbage in a file
> we thought were UTF8 and then back up to the beginning and retry with the
> default encoding (if different from UTF8).  Probably works.  May not be worth
> the bother.
> 
> --lars 
> 
>> -----Original Message-----
>> From: Edwin Smith
>> Sent: 16. juni 2008 15:46
>> To: Lars Hansen; Michael Daumling; tamarin-devel at mozilla.org
>> Subject: RE: ASC parsing bug?
>> 
>> Maybe the best guess for asc is java's default system
>> encoding in that case?
>> 
>>> -----Original Message-----
>>> From: tamarin-devel-bounces at mozilla.org [mailto:tamarin-devel-
>>> bounces at mozilla.org] On Behalf Of Lars Hansen
>>> Sent: Monday, June 16, 2008 9:07 AM
>>> To: Michael Daumling; tamarin-devel at mozilla.org
>>> Subject: RE: ASC parsing bug?
>>> 
>>> It appears to be the case that the ASC parser, if presented with an
>>> input file that does not start with a BOM, will assume the file is
>>> UTF8.  If the file is not actually UTF8 encoded but rather ascii with
>>> some extended-ascii characters (as I assume your test case is) then
>>> the parser (probably the Java input buffer layer actually) will
>>> interpret those extended characters according to its own notions.  I'm
>>> not sure whether that should be considered an compiler error or user
>>> error.  After all, there are no data available that allow
>>> the parser to figure out the encoding of the file.
>>> 
>>> --lars
>>> 
>>>> -----Original Message-----
>>>> From: tamarin-devel-bounces at mozilla.org
>>>> [mailto:tamarin-devel-bounces at mozilla.org] On Behalf Of Michael
>>>> Daumling
>>>> Sent: 16. juni 2008 08:41
>>>> To: tamarin-devel at mozilla.org
>>>> Subject: ASC parsing bug?
>>>> 
>>>> Hi,
>>>> 
>>>> During testing my String implementation, I found that ASC
>> seems to 
>>>> parse the string "Sören Lehmenkühler", which seems to be a fine
>>>> German name, badly. Instead of the "ö" and "ü"
>>>> characters, the UTF-8 string in the ABC image contains
>>>> REPLACEMENT_CHAR (0xFFFD).
>>>> 
>>>> Michael
>>>> 
>>>> _______________________________________________
>>>> Tamarin-devel mailing list
>>>> Tamarin-devel at mozilla.org
>>>> https://mail.mozilla.org/listinfo/tamarin-devel
>>>> 
>>> _______________________________________________
>>> Tamarin-devel mailing list
>>> Tamarin-devel at mozilla.org
>>> https://mail.mozilla.org/listinfo/tamarin-devel
>> 
> _______________________________________________
> Tamarin-devel mailing list
> Tamarin-devel at mozilla.org
> https://mail.mozilla.org/listinfo/tamarin-devel



More information about the Tamarin-devel mailing list