/\1/ could be a valid RegExp through Chapter 16 Extension clause?

Mike Samuel mikesamuel at gmail.com
Thu Jul 7 12:17:17 PDT 2011


2011/7/7 Lasse Reichstein <reichsteinatwork at gmail.com>:
>
>
> On Thu, Jul 7, 2011 at 3:52 AM, Mike Samuel <mikesamuel at gmail.com> wrote:
>>
>> Yes, by the extension, and whether a \<octal> is a backreference or an
>> octal escape sequence is determined by whether there are
>> parseInt(<octal>, 10) capturing groups to the left of it in the
>> regular expression.
>> So
>>    /\1(foo)\1/
>> matches the same language as
>>   /\u0001(foo)\1/
>
>
> I don't think thats correct.
> The \1 is a valid DecimalEscape, its value is 1, which is not greater than
> NCapturingParens in 15.10.2.9 step 7 (NCapturingParens is defined globally
> for the pattern, not just to the left of the current escape). I.e., it is
> not a Syntax Error, so the \1 must be treated as a back-reference. It will
> always be to a non-participating capture, so the regexp is equivalent to
>     /(foo)\1/
> or just
>     /(foo)foo/
> but never to
>     /\u0001(foo)foo/
> Regards
> /Lasse

I was wrong.  You're right about the spec language of course and empirically,
    /\1(foo)\1/.test("\u0001foofoo")
is true, but what I should have been testing is
    /^\1(foo)\1$/.test("\u0001foofoo")
which is false on all the interpreters I have installed.
I think the first test spuriously matches because group 1 is
initialized empty at the point that it matches the first \1.

One way to tell whether the group initialized to empty works on an
interpreter is to test
    /^(?:\1x(y)x){2}$/.test("xyxyxyx")
which is true in most interpreters, but false in Rhino1.7 and Chrome12.

Interestingly other perl 5 interpreters

    perl -e '$s = "xyxyxyx";  $m = scalar($s =~ /^(?:\1x(y)x){2}$/) ?
"true" : "false"; print "$m\n"'

yields false, as does the java

    public class Foo {
      public static void main(String... argv) {
        System.out.println(java.util.regex.Pattern.compile(
            "^(?:\\1x(y)x){2}\\z").matcher("xyxyxyx").matches());
      }
    }

The python
    import re
    re.match(r"^(?:\1x(y)x){2}$", "xyxyxyx")
fails with
    sre_constants.error: bogus escape: '\\1'
but not if the \1 is after the capturing group.


More information about the es-discuss mailing list