ASC parsing bug?

Lars Hansen lhansen at adobe.com
Tue Jun 17 06:29:32 PDT 2008


ES4 does not provide any mechanism for expressing the encoding of the
script, it's considered an environment issue.  There is also the
question about normalizing the content after decoding.  ES3 requires
(assumes, really) content to be Unicode Normalized Form "C"; ES4 will
presumably do the same.  Glancing at the ASC source, it appears not to
perform any normalization of the input.

--lars

> -----Original Message-----
> From: tamarin-devel-bounces at mozilla.org 
> [mailto:tamarin-devel-bounces at mozilla.org] On Behalf Of Thomas Reilly
> Sent: 17. juni 2008 15:17
> To: Mike Shaver; Steven Johnson
> Cc: tamarin-devel at mozilla.org
> Subject: RE: ASC parsing bug?
> 
> 
> I don't know I kinda like the Java behavior of using the 
> system encoding.  The OSes typically know there locale seems 
> silly not to use it.   That's what will be the most 
> convienent for folks and that's probably how most other tools 
> work, no?
> 
> Instead of making our own rules I think we should look at 
> what our peers do:
> 
> javac (system encoding overridable with -encoding)
> python (assumes ASCII, overridable with # -*- coding: utf-8 
> -*- at top of script) 
> perl (assumes ASCII/system I think with use utf8 pragma) 
> 
> Okay so if these 3 don't assume UTF8 its probably a bad idea 
> for ASC to, IMHO.
> 
> What we lack is a way to in the code express what the 
> encoding is.   Does ES4 have any encoding pragmas?   Perl's 
> "use utf8;" seems nice.
> 
> -----Original Message-----
> From: tamarin-devel-bounces at mozilla.org on behalf of Mike Shaver
> Sent: Mon 6/16/2008 3:40 PM
> To: Steven Johnson
> Cc: tamarin-devel at mozilla.org
> Subject: Re: ASC parsing bug?
>  
> On Mon, Jun 16, 2008 at 6:32 PM, Steven Johnson 
> <stejohns at adobe.com> wrote:
> > Having a tool like ASC try to guess the proper encoding 
> sounds like a recipe
> > for long-term pain to me. (Hey, browser guys, how much fun 
> is it to guess
> > the encoding of poorly-marked HTML? :-)
> 
> I'm going to be nice and pretend you didn't ask.
> 
> > IMHO, if the encoding isn't either (1) explicitly specified, or (2)
> > absolutely clear from a BOM, ASC should fail.
> 
> I think that is too harsh on the most common case: ASCII without BOM
> or other adornments.  A default of UTF-8 seems pretty reasonable, and
> I don't believe that UTF-8 requires a BOM since bytes are considered
> individually?
> 
> If you want anything other than UTF-8, you should say so with an
> explicit argument.
> 
> Mike
> _______________________________________________
> Tamarin-devel mailing list
> Tamarin-devel at mozilla.org
> https://mail.mozilla.org/listinfo/tamarin-devel
> 
> _______________________________________________
> Tamarin-devel mailing list
> Tamarin-devel at mozilla.org
> https://mail.mozilla.org/listinfo/tamarin-devel
> 


More information about the Tamarin-devel mailing list