Full Unicode based on UTF-16 proposal

Wes Garland wes at page.ca
Sat Mar 24 13:11:32 PDT 2012

On 24 March 2012 15:25, David Herman <dherman at mozilla.com> wrote:

> > Presumably the JS source, as a sequence of UTF-16 code units, represents
> the tetragram code points as surrogate pairs.
> Clarification: the JS source *of the regexp literal*.
We certainly can, although this means that certain Unicode Strings cannot
be matched by a regexp with this flag. These strings would be the ones
containing reserved code points.

That said, why is the JS source suddenly a sequence of UTF-16 code units?I
believe JS source code should be a sequence of Unicode code points (and I
think ES5 says something to this effect).

The underlying transport format should not be a concern for the JS lexer.
The lexer should receive a series of code points from the network
transport, allowing web sites to transmit JS in whatever encoding they see
fit, provided the browser and server can both agree on it.  I think UTF-8
would make a fine transport format for JS source code.  IMHO the transport
format between the browser and the JS lexer [i.e. the input program
encoding] should be allowed to be implementation-defined and not specified
by TC-39.


Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120324/8908273b/attachment.html>

More information about the es-discuss mailing list