Regexp capturing groups.

Markus Jarderot marjar-4 at student.ltu.se
Thu Sep 4 07:02:19 PDT 2008


When I first noticed this in Firefox I thought it was a bug. After some 
investigation it turns out that the problem was in the specification. 
What I am talking about is that ES discards the capturing groups on 
repetition. I don't know of any Regexp engine which is not based on the 
ECMA-262 standard that behaves like this.

(Using JavaScript as implemented in Mozilla Firefox 3.0.1)

A simple example:
/(?:(a)|(b))*/.exec("ababa") -> ["ababa", "a", ""]
It recognizes each letter in turn, but when it is time to match the next 
one it discards the result of the last repetition.

A little more practical example, URL query key/value matching:
var match = 
/\/thread\.php(?:[&?]key1=([^&#]*)|[&?]key2=([^&#]*)|[&?][^&#]*)*/.exec(url);
var value1 = match[1];
var value2 = match[2];
This would on most other Regexp engines store the value after key1 in 
group 1, and the value after key2 in group 2, independent on the order 
in the input-string. But on ECMA-262 based engines, only the last 
matching value would be kept.
The same technique could be applied to attributes in HTML-tags.

To get this to work with ECMA-262 based engines, you could first pick 
out the query-string with one Regexp, and then look for each key in turn.
var query = /\/thread\.php(\?[^#]*)/.exec(url)[1];
var value1 = /[&?]key1=([^&#]*)/.exec(query)[1];
var value2 = /[&?]key2=([^&#]*)/.exec(query)[1];

I don't know if any web application depends on this behavior, but I 
wouldn't write any code that did.

This problem, and that of back-references to non-participating groups, 
have been discussed on this list before, but nothing seems to have come 
out of it.
https://mail.mozilla.org/pipermail/es-discuss/2007-September/thread.html#4513
https://mail.mozilla.org/pipermail/es-discuss/2007-September/thread.html#4574



More information about the Es-discuss mailing list