Q: Lonely surrogates and unicode regexps

André Bargull andre.bargull at udo.edu
Wed Jan 28 08:37:39 PST 2015

> Cool, thanks for clarifications!
> To make sure, as per the "intended semantics", we never allow splitting a
> valid surrogate pair (= matching only one of the surrogates but not the
> other), and thus we'll differ from the Java implementation here:
> /foo(.+)bar\1/u.test("foo\uD834bar\uD834\uDC00"); we say "false", Java says
> "true".

Correct, the captures List entry is [\uD834], so when performing AtomEscape, \uD834 is 
matched against \uD834\uDC00 in step 8 which results in a failure state.

> (In addition, /^(.+)\1$/u.test("\uDC00foobar\uD834\uDC00foobar\uD834") ==
> false.)

Yes, this expression also returns false.

