Regexp backreferences

Gavin Barraclough barraclough at apple.com
Tue Aug 9 16:00:33 PDT 2011


Hi Paul,

I tested this a while back, and I hit a couple of web sites using octal escapes in regexp literals (from memory gmail might have been one, and possibly old versions of jquery? - but I'm really not sure), I came to the conclusion that this wasn't going to fly.
However our current octal implementation seems a little overly permissive, so I've been thinking about trying restricting these in JavaScriptCore in a couple of ways:

1) There is probably no need to support invalid numeric escapes that do not correspond to a single octal character in the range 0..255 (e.g. /\777/).  We can probably generate early errors for values outside this range.  This change would be a useful simplification since the currently implementation requires us to choose whether to spilt numeric escape based terms later in the regexp.  E.g. consider the regexp /\39()()()...()()()/.  The current behaviour is that the numeric escape my be treated as either one term (a backreference) or two terms (octal escape \3 and character 9), depending on how many subpattern captures follow it.  It would be nice to remove this complexity.

2) Since octal escapes are not permitted in string literals in strict mode, it might be a good idea to prohibit them from regexp literals too.  When parsing strict more code we could validate that regexp literals only contain valid numeric escapes.  This could be something useful to incorporate into the spec, since we could explicitly define that this is only allowed in strict mode, as the spec already does for octal numeric literals & escapes in strings - and any extension to support octal escapes in regexps would be a natural fit for annex B.1.


On the topic of octal literals, there is something odd in the spec's current definition for octal escapes in string literals.
Section B.1.2 currently defines:

OctalEscapeSequence ::
	OctalDigit [lookahead is not DecimalDigit]
	ZeroToThree OctalDigit [lookahead is not DecimalDigit]
	FourToSeven OctalDigit
	ZeroToThree OctalDigit OctalDigit

If I'm reading the spec correctly, the following string literals are valid:
	"\4779"	(equivalent to "\x27" + "79")
	"\3779"	(equivalent to "\xFF" + "9")
	"\479"	(equivalent to "\x27" + "9")
But these are not:
	"\379"
	"\49"
	"\39"

For consistency I'd suggest revising the spec to either:

OctalEscapeSequence ::
	OctalDigit [lookahead is not OctalDigit]
	ZeroToThree OctalDigit [lookahead is not OctalDigit]
	FourToSeven OctalDigit
	ZeroToThree OctalDigit OctalDigit

or:

OctalEscapeSequence ::
	OctalDigit [lookahead is not DecimalDigit]
	ZeroToThree OctalDigit [lookahead is not DecimalDigit]
	FourToSeven OctalDigit [lookahead is not DecimalDigit]
	ZeroToThree OctalDigit OctalDigit [lookahead is not DecimalDigit]

I'd prefer the latter of the two options, since I think we could fairly cleanly define octal escape syntax for non-strict regexp literals to match this.

cheers,
G.


On Aug 9, 2011, at 1:41 PM, Paul Biggar wrote:

> [apologies for the duplicate post on test262@, my first attempt
> bounced on es-discuss@]
> 
> Hi folks,
> 
> Currently in SpiderMonkey, overlarge regex backreferences do not cause
> syntax errors. We are considering making them cause syntax errors for
> test262 compliance, but are worried it will break the web.
> 
> As far as I can tell, we inherited our behaviour from Apple's Yarr,
> which took the behaviour from us, who originally copied IE back when
> that was important. Chrome does the same as us.
> 
> Does anyone have any idea or data as to whether this will break the
> web? If no such data exists, we can always just try it and see if it
> breaks things. Should this be removed form test262?
> 
> Thanks,
> Paul
> 
> 
> Firefox bug:
>  https://bugzilla.mozilla.org/show_bug.cgi?id=413155
> 
> 
> test262 test cases:
>    http://hg.ecmascript.org/tests/test262/file/034836894a85/test/suite/sputnik_converted/15_Native/15.10_RegExp_Objects/15.10.2_Pattern_Semantics/15.10.2.11_DecimalEscape/S15.10.2.11_A1_T2.js
> 
>    http://hg.ecmascript.org/tests/test262/file/034836894a85/test/suite/sputnik_converted/15_Native/15.10_RegExp_Objects/15.10.2_Pattern_Semantics/15.10.2.11_DecimalEscape/S15.10.2.11_A1_T3.js
> 
> 
> --
> Paul Biggar
> Compiler Geek
> pbiggar at mozilla.com
> @paulbiggar
> _______________________________________________
> test262-discuss mailing list
> test262-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/test262-discuss



More information about the es-discuss mailing list