Full Unicode based on UTF-16 proposal
wes at page.ca
Sat Mar 24 13:11:32 PDT 2012
On 24 March 2012 15:25, David Herman <dherman at mozilla.com> wrote:
> > Presumably the JS source, as a sequence of UTF-16 code units, represents
> the tetragram code points as surrogate pairs.
> Clarification: the JS source *of the regexp literal*.
We certainly can, although this means that certain Unicode Strings cannot
be matched by a regexp with this flag. These strings would be the ones
containing reserved code points.
That said, why is the JS source suddenly a sequence of UTF-16 code units?I
believe JS source code should be a sequence of Unicode code points (and I
think ES5 says something to this effect).
The underlying transport format should not be a concern for the JS lexer.
The lexer should receive a series of code points from the network
transport, allowing web sites to transmit JS in whatever encoding they see
fit, provided the browser and server can both agree on it. I think UTF-8
would make a fine transport format for JS source code. IMHO the transport
format between the browser and the JS lexer [i.e. the input program
encoding] should be allowed to be implementation-defined and not specified
Wesley W. Garland
Director, Product Development
+1 613 542 2787 x 102
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the es-discuss