Regex: How should backreferences contained in the capturing match they reference to work?

liorean liorean at gmail.com
Thu Sep 13 09:25:14 PDT 2007


On 13/09/2007, Lars T Hansen <lth at acm.org> wrote:
> My answer to you is obviously that you should implement the ES3
> behavior (as should JScript and JavaScriptCore).

Of course I'll implement the ES3 behaviour if ES4 makes no change to
it, I just think it would be better to change the behaviour to
something that actually matches developer expectations instead.

> The current behavior
> is well-defined; it's not a hardship for anyone; the incompatibilities
> among the engines are probably not a big deal (thus the incompatible
> engines can be changed so that they conform); thus there is no
> compelling reason to change the spec either.

That the spec doesn't match expectations and that there are behaviours
that do make sense to replace it with, coupled with the fact there
seems to be no obvious compatibility problem with changing it
(otherwise JScript and JavaScriptCore surely would have been changed
to match the ES3 behaviour), makes no compelling reason?

 > > ES3: If I understand 15.10.2.5 Term, fourth algorithm, point 4, the
> > correct results would be  ['ab','a','b'] since the \2 capture would be
> > set to undefined before the second repetition. Thus futhark and
> > SpiderMonkey fail to implement this part of ES3 Regex. (I'd be happy
> > if it turned out I'm wrong here, however.)
>
> FWIW, the ES4 RI agrees with Mozilla and Opera.
>
> Trying to trace the matching above, we have:
>   ...+
>    (?:...)
>     (\2|a) succeeds without consuming input
>     (b)? succeeds without consuming input
>    so (?:...) succeds without consuming input
>   so ...+ starts backtracking [step 1 of the continuation for RepeatMatcher]
>     so (\2|a) succeeds, consuming "a"
>     and (b)? succeeds, consuming "b"
>    so (?:...) succeeds consuming "ab" with captures [a,b]
>  and ...+ repeats:
>    (?:...)
>     (\2|a) succeeds without consuming input
>     and (b)? succeeds, consuming "b"
>    so (?:...) succeeds consuming "b" with captures [,b]
>  and ...+ repeats:
>    (?:...)
>     (\2|a) succeeds without consuming input
>     (b)? succeeds without consuming input
>    so (?:...) succeds without consuming input
>   so + starts backtracking
>     so (\2|a) succeeds, consuming "a"
>     and (b)? succeeds without consuming input
>    so (?:...) succeeds consuming "a" with captures [a,]
>  so ...+ succeeds, matching "abba" with final capture array [a,]
>
> Sorry for the lack of rigor in the above, but I believe that's correct.

Ah, the "undefined really means a match with the empty string" thing
again. I should have known...
-- 
David "liorean" Andersson



More information about the Es4-discuss mailing list