Questions regarding ES6 Unicode regular expressions
allen at wirfs-brock.com
Tue Aug 26 14:18:53 PDT 2014
On Aug 26, 2014, at 1:45 PM, Norbert Lindenberg wrote:
> On Aug 26, 2014, at 11:15 , Mathias Bynens <mathias at qiwi.be> wrote:
>> On 26 Aug 2014, at 19:01, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
>>> I see one remaining issue:
>>> In ES5 (and ES6): `/a-z/i` does not match U+017F (ſ) or U+212A (K) because the ES canonicalization algorithm excludes mapping code points > 127 that toUpperCase to code points <128.
>>> However, as currently spec'ed, the ES6 canonicalization algorithm for /u RegExps does not include that >127/<128 exclusion. It maps U+017F to "S" which matches.
>>> This is probably a minor variation, from the ES5 behavior, but we should probably be sure it is a desirable and tolerable change as we presumably could also apply the >127/<128 filter to /u canonicalization.
>> This is a useful feature, and the explicit opt-in makes the small back-compat break acceptable IMHO.
> I’d say the explicit opt-in means that there is no backwards compatibility issue.
Except, as discussed WRT \d, if a JS programmer updates an existing regexp using /u simply because they want to allow for "full Unicodee" they may not also realize that they are changing matching semantics changes in other ways. As Claude said, this will cause bugs.
> I removed the exclusion based on input from Erik Corry on es-discuss:
> At the March 2012 TC39 shortly after, Waldemar explained the motivation for the exclusion, but Unicode case folding was approved with the “u” flag:
I'm actually not very worried that the canonicalization of U+017F and friends is going to break anything. But, if we are reexamining decisions in this space, it should be on the table.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the es-discuss