ASC parsing bug?
stejohns at adobe.com
Mon Jun 16 10:18:05 PDT 2008
Or, if we encounter a non-ASCII sequence, and there's no BOM and no explicit
encoding specified, simply fail compilation with an explicit error.
Draconian but effective.
On 6/16/08 7:08 AM, "Lars Hansen" <lhansen at adobe.com> wrote:
> The problem is how we can know that we should /not/ be using UTF8 (so that we
> can choose the default encoding). Already ASC allows an encoding to be
> specified explicitly, and UTF8 is the fallback from that case. (Not clear to
> me yet which of the clients of the compiler actually pass an encoding and
> where they obtain it from.)
> The only viable strategy I can think of is if we encounter garbage in a file
> we thought were UTF8 and then back up to the beginning and retry with the
> default encoding (if different from UTF8). Probably works. May not be worth
> the bother.
>> -----Original Message-----
>> From: Edwin Smith
>> Sent: 16. juni 2008 15:46
>> To: Lars Hansen; Michael Daumling; tamarin-devel at mozilla.org
>> Subject: RE: ASC parsing bug?
>> Maybe the best guess for asc is java's default system
>> encoding in that case?
>>> -----Original Message-----
>>> From: tamarin-devel-bounces at mozilla.org [mailto:tamarin-devel-
>>> bounces at mozilla.org] On Behalf Of Lars Hansen
>>> Sent: Monday, June 16, 2008 9:07 AM
>>> To: Michael Daumling; tamarin-devel at mozilla.org
>>> Subject: RE: ASC parsing bug?
>>> It appears to be the case that the ASC parser, if presented with an
>>> input file that does not start with a BOM, will assume the file is
>>> UTF8. If the file is not actually UTF8 encoded but rather ascii with
>>> some extended-ascii characters (as I assume your test case is) then
>>> the parser (probably the Java input buffer layer actually) will
>>> interpret those extended characters according to its own notions. I'm
>>> not sure whether that should be considered an compiler error or user
>>> error. After all, there are no data available that allow
>>> the parser to figure out the encoding of the file.
>>>> -----Original Message-----
>>>> From: tamarin-devel-bounces at mozilla.org
>>>> [mailto:tamarin-devel-bounces at mozilla.org] On Behalf Of Michael
>>>> Sent: 16. juni 2008 08:41
>>>> To: tamarin-devel at mozilla.org
>>>> Subject: ASC parsing bug?
>>>> During testing my String implementation, I found that ASC
>> seems to
>>>> parse the string "Sören Lehmenkühler", which seems to be a fine
>>>> German name, badly. Instead of the "ö" and "ü"
>>>> characters, the UTF-8 string in the ABC image contains
>>>> REPLACEMENT_CHAR (0xFFFD).
>>>> Tamarin-devel mailing list
>>>> Tamarin-devel at mozilla.org
>>> Tamarin-devel mailing list
>>> Tamarin-devel at mozilla.org
> Tamarin-devel mailing list
> Tamarin-devel at mozilla.org
More information about the Tamarin-devel