Full Unicode based on UTF-16 proposal
ecmascript at norbertlindenberg.com
Sun Mar 25 23:11:47 PDT 2012
Perfectly valid concerns.
My thinking here is that normally applications want to deal with code points, but we force them to deal with UTF-16 and additional flags because we need them for compatibility. Within modules, where we know that compatibility is not an issue, I'd rather give applications by default what they need.
On Mar 24, 2012, at 23:56 , David Herman wrote:
> On Mar 24, 2012, at 4:32 PM, Norbert Lindenberg wrote:
>> One concern: I think code point based matching should be the default for regex literals within modules (where we know the code is written for Harmony).
> This idea makes me nervous. Partly because I think we should keep the set of semantic changes between non-module code and module code reasonable small, and partly because the idea of your proposal is to continue to treat strings as sequences of 16-bit code units, not Unicode code points-- which means that quietly switching regexps to be closer to operating at the level of code points seems like it creates a kind of impedance mismatch. It feels more appropriate to me to require programmers to declare explicitly that they're dealing with a string at the level of code points, using the (quite concise) /u flag. That way they're saying "yes, I know this string is just a sequence of 16-bit code points, but it may contain non-BMP data, and I would like to match its contents with a regexp that deals with code points."
> (Again, I'm still new to the finer points of Unicode, so I'm prepared to be shown I'm thinking about it wrong.)
More information about the es-discuss