Look-behind proposal in trouble
Nozomu Katō
noz.ka at akenotsuki.com
Wed Oct 7 10:59:17 UTC 2015
What Claude mentioned is already part of the specification: "Input is a
List consisting of all of the characters" and "Each character is either
a code unit or a code point, depending upon the kind of pattern
involved" (21.2.2.1).
But I added the Note section to the page of my proposal for
clarification two days ago because I was asked a similar question.
Incidentally, in the initial version of the proposal I used the term
"code point" but later changed it to "character" since Allen pointed
out:
https://mail.mozilla.org/pipermail/es-discuss/2015-May/042922.html
Regards,
Nozomu
Erik Corry wrote on Wed, 7 Oct 2015, at 11:16:54 +0200:
> The proposal needs to be clarified to explain that you are stepping back a
> number of code points, not units. This implies that you are inspecting the
> input string as you step backwards. Also it should be explained what to do
> if there are unpaired surrogates in the input string and inside the
> lookbehind expression source.
>
> I think the proposal would benefit from a pointer to an implementation or
> two. Of course the implementations should also fully support /u.
>
> On Wed, Oct 7, 2015 at 11:10 AM, Claude Pache
> wrote:
>
>> This should not be a problem: With the /u flag, you work with code points,
>> not code units. In particular, the `.` matches always a sequence (of code
>> points with /u, or code units otherwise) of length 1.
>>
>> —Claude
More information about the es-discuss
mailing list