Suggest adopting .NET/Perl regexp named capture syntax

Lars T Hansen lth at acm.org
Wed Oct 24 01:12:02 PDT 2007


On 10/24/07, StevenLevithan <steves_list at hotmail.com> wrote:
>
> ECMAScript 4 regular expression extension proposals indicate that the Python
> syntax will be used for named capture. Python uses (?P<name>...) for named
> capture, (?P=name) for a backreference within the regex, and \g<name> for a
> backreference within a replacement string. Personally, I feel this a
> mistake.
>
> Although Python was the first to implement named capture, other libraries
> seem to be standardizing around .NET's alternative syntax, which uses
> (?<name>...) or (?'name'...) for capture, \k<name> or \k'name' for a
> backreference within the regex, and ${name} for a backreference within a
> replacement string. Perl 5.10 has adopted .NET's syntax (although
> backrefereces within a replacement string use $+{name}) since "most people
> consider it to be nicer". Recent versions of PCRE have followed Perl's lead
> by supporting .NET's syntax for named capture as the preferred style.

I guess that counts as momentum...

> Here are the problems I see with the Python syntax:
>
> (?P<name>...)
> - What does the "P" stand for? "Python"? The character is unnecessary and
> unhelpful.

Or alternatively, it makes it possible to use other characters later
for other purposes.

> (?P=name)
> - Backreferences should not use parentheses since they are a single token
> and not a grouping.
>
> \g<name>
> - Is this a single token in ES4, or a string which in a string literal will
> have to be written as "\\g<name>"?

ES4 does not have this functionality.  That may be an oversight and is
now logged as http://bugs.ecmascript.org/ticket/255.

IMO the most natural syntax for ES4 is something like $<name>, where
name is restricted to one of the names actually captured by the
RegExp: it's not an arbitrary property name or variable name.

> If it is the former, how will you be able
> to generate a replacement string using e.g. a textarea with user input, and
> if it's the latter, it seems less elegant than ${name} , which follows the
> '$ denotes a backreference' convention.
>
> One other question... can the results from named capture be used in a
> replacement closure function? I.e., will you be able to do something like
> str.replace(/(?P<name>)/,function(match){return match.name;}); ?

The captured substrings with names are available as properties on the
match result object, so that should work, yes.

--lars



More information about the Es4-discuss mailing list