/\1/ could be a valid RegExp through Chapter 16 Extension clause?

Mike Samuel mikesamuel at gmail.com
Thu Jul 7 12:17:17 PDT 2011

2011/7/7 Lasse Reichstein <reichsteinatwork at gmail.com>:
> On Thu, Jul 7, 2011 at 3:52 AM, Mike Samuel <mikesamuel at gmail.com> wrote:
>> Yes, by the extension, and whether a \<octal> is a backreference or an
>> octal escape sequence is determined by whether there are
>> parseInt(<octal>, 10) capturing groups to the left of it in the
>> regular expression.
>> So
>>    /\1(foo)\1/
>> matches the same language as
>>   /\u0001(foo)\1/
> I don't think thats correct.
> The \1 is a valid DecimalEscape, its value is 1, which is not greater than
> NCapturingParens in step 7 (NCapturingParens is defined globally
> for the pattern, not just to the left of the current escape). I.e., it is
> not a Syntax Error, so the \1 must be treated as a back-reference. It will
> always be to a non-participating capture, so the regexp is equivalent to
>     /(foo)\1/
> or just
>     /(foo)foo/
> but never to
>     /\u0001(foo)foo/
> Regards
> /Lasse

I was wrong.  You're right about the spec language of course and empirically,
is true, but what I should have been testing is
which is false on all the interpreters I have installed.
I think the first test spuriously matches because group 1 is
initialized empty at the point that it matches the first \1.

One way to tell whether the group initialized to empty works on an
interpreter is to test
which is true in most interpreters, but false in Rhino1.7 and Chrome12.

Interestingly other perl 5 interpreters

    perl -e '$s = "xyxyxyx";  $m = scalar($s =~ /^(?:\1x(y)x){2}$/) ?
"true" : "false"; print "$m\n"'

yields false, as does the java

    public class Foo {
      public static void main(String... argv) {

The python
    import re
    re.match(r"^(?:\1x(y)x){2}$", "xyxyxyx")
fails with
    sre_constants.error: bogus escape: '\\1'
but not if the \1 is after the capturing group.

More information about the es-discuss mailing list