Full Unicode based on UTF-16 proposal
erik.corry at gmail.com
Fri Mar 16 01:55:46 PDT 2012
This is very useful, and was surely a lot of work. I like the general
thrust of it a lot. It has a high level of backwards compatibility,
does not rely on the VM having two different string implementations in
it, and it seems to fix the issues people are encountering.
However I think we probably do want the /u modifier on regexps to
control the new backward-incompatible behaviour. There may be some
way to relax this for regexp literals in opted in Harmony code, but
for new RegExp(...) and for other string literals I think there are
rather too many inconsistencies with the old behaviour.
The algorithm given for codePointAt never returns NaN. It should
probably do that for indices that hit a trail surrogate that has a
lead surrogate preceeding it.
Perhaps it is outside the scope of this proposal, but it would also
make a lot of sense to add some named character classes to RegExp.
If we are makig a /u modifier for RegExp it would also be nice to get
rid of the incorrect case independent matching rules. This is the
section that says: "If ch's code unit value is greater than or equal
to decimal 128 and cu's code unit value is less than decimal 128,
then return ch."
2012/3/16 Norbert Lindenberg <ecmascript at norbertlindenberg.com>:
> Based on my prioritization of goals for support for full Unicode in ECMAScript , I've put together a proposal for supporting the full Unicode character set based on the existing representation of text in ECMAScript using UTF-16 code unit sequences:
> The detailed proposed spec changes serve to get a good idea of the scope of the changes, but will need some polishing.
>  https://mail.mozilla.org/pipermail/es-discuss/2012-February/020721.html
> es-discuss mailing list
> es-discuss at mozilla.org
More information about the es-discuss