Fwd: Questions regarding ES6 Unicode regular expressions

Till Schneidereit till at tillschneidereit.net
Mon Aug 25 02:44:55 PDT 2014


(Forwarding to Norbert as I don't know how closely he follows es-discuss.)

---------- Forwarded message ----------
From: Mathias Bynens <mathias at qiwi.be>
Date: Mon, Aug 25, 2014 at 10:59 AM
Subject: Questions regarding ES6 Unicode regular expressions
To: es-discuss <es-discuss at mozilla.org>


Norbert’s original proposal for the `u` flag (
http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/#RegExp)
mentioned the following:

> Possibly the definition of the character classes `\d\D\w\W\b\B` is
extended to their Unicode extensions, such as all characters in the Unicode
category “Number, decimal” for `\d`, as proposed by Steven Levithan.
Whether this can be done under the same flag or requires a different one
still needs discussion.

Has this been discussed any further? (I couldn’t find any mention of it in
the meeting notes repository.) Should I file a bug?

Norbert also suggested replacing ‘characters’ with ‘code points’ in
sections like
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-characterclassescape
and
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-charactersetmatcher-abstract-operation
when the `u` flag is set. It seems the intent was to make e.g. `/\d/u`
match `/[0-9]/`, and `/\D/u` match all Unicode code points except `[0-9]`.
This is different from `/\D/` which only matches BMP code points.

It seems like this change has not propagated to the spec draft, though. Is
this correct, and if so, what’s the reason for that?

The same goes for `/[^a]/u` – should this match all Unicode code points
except `a` or should it only match BMP code points?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140825/e4f88fec/attachment-0001.html>


More information about the es-discuss mailing list