Full Unicode based on UTF-16 proposal

Steven Levithan steves_list at hotmail.com
Mon Mar 26 22:49:10 PDT 2012


Norbert Lindenberg wrote:
>The ugly world of web reality...
>
>Actually, in V8, Firefox, Safari, and IE, /[\u{10000}]/ seems to be the
>same as /[\\u01{}]/ - it matches "\\u01{}u01". In Opera, it doesn't seem to
>match anything, but doesn't throw the specified SyntaxError either.

How did you test this. I get consistent results that agree with Erik in IE 
9, Firefox 11, Chrome 17, and Safari 5.1:

"\\u01{}".match(/[\u{10000}]/g); // ['u','0','1','{','}']
/\u{2}/g.test("uu"); // true

Opera, as you said, returns null and false (tested v11.6 and v10.0).

>Do we know of any applications actually relying on these bugs, seeing that
>browsers don't agree on them?

Minus Opera, browsers do agree on them. Admirably so. And they aren't 
bugs--they're intentional breaks from ES for backcompat with earlier 
implementations that were themselves designed for backcompat with older 
non-ES regex behavior. The RegExp Match Web Reality proposal at 
<http://wiki.ecmascript.org/doku.php?id=harmony:regexp_match_web_reality> 
says to add them to the spec, and Allen has said the web reality proposal 
should be the top RegExp priority for ES6.

I'd easily believe it's safe enough to change /[\u{n..}]/ because of the 
four-part sequence involved in \u + { + n.. + } that is fairly unlikely to 
appear in that specific order in a character class. But I'd have a harder 
time believing /\u{n..}/ is safe to change. It would of course be great to 
have some real data on the risks/damage.

>For string literals, I see that most implementations correctly throw a
>SyntaxError when given "\u{10}". The exception here is V8.

I'm sure it would be safer to allow \u{n..} for string literals even if this 
fortunate SyntaxError wasn't thrown. Users haven't been trained to think of 
escaped nonmetacharacters as safe for string literals to the extent that 
they have for regexes, and you can't programmatically generate such escapes 
so easily as when passing to the RegExp constructor.

-- Steven Levithan



More information about the es-discuss mailing list