Full Unicode based on UTF-16 proposal

Steven Levithan steves_list at hotmail.com
Mon Mar 26 23:32:40 PDT 2012


The idea for /u and the following aspects of it already seem to have some 
consensus:

- Switch from code unit to code point matching.
- Make \d\w\b Unicode-aware.
- Make /i use proper Unicode casefolding.
- Enable \u{x..} (break from web reality).

Since /u may be a one-time opportunity to broadly change RegExp semantics, 
how about adding another change on the pile?

- Break from web reality for escaped A-Z and a-z. Throw a SyntaxError when 
any letter not assigned a special meaning is escaped, instead of matching 
the literal character.

I.e., /\i/u etc. must throw a SyntaxError.

This is relevant to future Unicode support, because without breaking web 
reality we might never be able to add \p{..} and \P{..} for Unicode 
properties, \X for graphemes, \N{..} for named characters, etc.

Of course, this change would also make it easier to add any from a host of 
special escapes in other regex libraries (such as \k<..> for named 
backreferences) or new ES inventions. It's really ugly that such features 
might not be able to be added by default everywhere, but them's the breaks, 
I suppose (I hope I'm wrong).

We could go crazy and start fixing all of ES's RegExp warts when /u is 
applied, even though such changes would not be related to Unicode support. 
I'd be happy to pursue that, but I suspect many here would see it as a 
bridge too far.

Thoughts?

-- Steven Levithan



More information about the es-discuss mailing list