Q: Lonely surrogates and unicode regexps
andre.bargull at udo.edu
Wed Jan 28 08:37:39 PST 2015
> Cool, thanks for clarifications!
> To make sure, as per the "intended semantics", we never allow splitting a
> valid surrogate pair (= matching only one of the surrogates but not the
> other), and thus we'll differ from the Java implementation here:
> /foo(.+)bar\1/u.test("foo\uD834bar\uD834\uDC00"); we say "false", Java says
Correct, the captures List entry is [\uD834], so when performing 188.8.131.52 AtomEscape, \uD834 is
matched against \uD834\uDC00 in step 8 which results in a failure state.
> (In addition, /^(.+)\1$/u.test("\uDC00foobar\uD834\uDC00foobar\uD834") ==
Yes, this expression also returns false.
More information about the es-discuss