/\1/ could be a valid RegExp through Chapter 16 Extension clause?

Oliver Hunt oliver at apple.com
Thu Jul 7 13:11:31 PDT 2011


CC'ing Gavin as he's been looking at RegExp compatibility in the real world vs. the spec recently.

--Oliver

On Jul 7, 2011, at 12:17 PM, Mike Samuel wrote:

> 2011/7/7 Lasse Reichstein <reichsteinatwork at gmail.com>:
>> 
>> 
>> On Thu, Jul 7, 2011 at 3:52 AM, Mike Samuel <mikesamuel at gmail.com> wrote:
>>> 
>>> Yes, by the extension, and whether a \<octal> is a backreference or an
>>> octal escape sequence is determined by whether there are
>>> parseInt(<octal>, 10) capturing groups to the left of it in the
>>> regular expression.
>>> So
>>>    /\1(foo)\1/
>>> matches the same language as
>>>   /\u0001(foo)\1/
>> 
>> 
>> I don't think thats correct.
>> The \1 is a valid DecimalEscape, its value is 1, which is not greater than
>> NCapturingParens in 15.10.2.9 step 7 (NCapturingParens is defined globally
>> for the pattern, not just to the left of the current escape). I.e., it is
>> not a Syntax Error, so the \1 must be treated as a back-reference. It will
>> always be to a non-participating capture, so the regexp is equivalent to
>>     /(foo)\1/
>> or just
>>     /(foo)foo/
>> but never to
>>     /\u0001(foo)foo/
>> Regards
>> /Lasse
> 
> I was wrong.  You're right about the spec language of course and empirically,
>    /\1(foo)\1/.test("\u0001foofoo")
> is true, but what I should have been testing is
>    /^\1(foo)\1$/.test("\u0001foofoo")
> which is false on all the interpreters I have installed.
> I think the first test spuriously matches because group 1 is
> initialized empty at the point that it matches the first \1.
> 
> One way to tell whether the group initialized to empty works on an
> interpreter is to test
>    /^(?:\1x(y)x){2}$/.test("xyxyxyx")
> which is true in most interpreters, but false in Rhino1.7 and Chrome12.
> 
> Interestingly other perl 5 interpreters
> 
>    perl -e '$s = "xyxyxyx";  $m = scalar($s =~ /^(?:\1x(y)x){2}$/) ?
> "true" : "false"; print "$m\n"'
> 
> yields false, as does the java
> 
>    public class Foo {
>      public static void main(String... argv) {
>        System.out.println(java.util.regex.Pattern.compile(
>            "^(?:\\1x(y)x){2}\\z").matcher("xyxyxyx").matches());
>      }
>    }
> 
> The python
>    import re
>    re.match(r"^(?:\1x(y)x){2}$", "xyxyxyx")
> fails with
>    sre_constants.error: bogus escape: '\\1'
> but not if the \1 is after the capturing group.
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss



More information about the es-discuss mailing list