Q: Lonely surrogates and unicode regexps
André Bargull
andre.bargull at udo.edu
Wed Jan 28 08:37:39 PST 2015
> Cool, thanks for clarifications!
>
> To make sure, as per the "intended semantics", we never allow splitting a
> valid surrogate pair (= matching only one of the surrogates but not the
> other), and thus we'll differ from the Java implementation here:
>
> /foo(.+)bar\1/u.test("foo\uD834bar\uD834\uDC00"); we say "false", Java says
> "true".
Correct, the captures List entry is [\uD834], so when performing 21.2.2.9 AtomEscape, \uD834 is
matched against \uD834\uDC00 in step 8 which results in a failure state.
>
> (In addition, /^(.+)\1$/u.test("\uDC00foobar\uD834\uDC00foobar\uD834") ==
> false.)
Yes, this expression also returns false.
More information about the es-discuss
mailing list