ASC parsing bug?

Thomas Reilly treilly at adobe.com
Tue Jun 17 06:17:05 PDT 2008


I don't know I kinda like the Java behavior of using the system encoding.  The OSes typically know there locale seems silly not to use it.   That's what will be the most convienent for folks and that's probably how most other tools work, no?

Instead of making our own rules I think we should look at what our peers do:

javac (system encoding overridable with -encoding)
python (assumes ASCII, overridable with # -*- coding: utf-8 -*- at top of script) 
perl (assumes ASCII/system I think with use utf8 pragma) 

Okay so if these 3 don't assume UTF8 its probably a bad idea for ASC to, IMHO.

What we lack is a way to in the code express what the encoding is.   Does ES4 have any encoding pragmas?   Perl's "use utf8;" seems nice.

-----Original Message-----
From: tamarin-devel-bounces at mozilla.org on behalf of Mike Shaver
Sent: Mon 6/16/2008 3:40 PM
To: Steven Johnson
Cc: tamarin-devel at mozilla.org
Subject: Re: ASC parsing bug?
 
On Mon, Jun 16, 2008 at 6:32 PM, Steven Johnson <stejohns at adobe.com> wrote:
> Having a tool like ASC try to guess the proper encoding sounds like a recipe
> for long-term pain to me. (Hey, browser guys, how much fun is it to guess
> the encoding of poorly-marked HTML? :-)

I'm going to be nice and pretend you didn't ask.

> IMHO, if the encoding isn't either (1) explicitly specified, or (2)
> absolutely clear from a BOM, ASC should fail.

I think that is too harsh on the most common case: ASCII without BOM
or other adornments.  A default of UTF-8 seems pretty reasonable, and
I don't believe that UTF-8 requires a BOM since bytes are considered
individually?

If you want anything other than UTF-8, you should say so with an
explicit argument.

Mike
_______________________________________________
Tamarin-devel mailing list
Tamarin-devel at mozilla.org
https://mail.mozilla.org/listinfo/tamarin-devel



More information about the Tamarin-devel mailing list