Suggest adopting .NET/Perl regexp named capture syntax
Lars T Hansen
lth at acm.org
Wed Oct 24 01:12:02 PDT 2007
On 10/24/07, StevenLevithan <steves_list at hotmail.com> wrote:
>
> ECMAScript 4 regular expression extension proposals indicate that the Python
> syntax will be used for named capture. Python uses (?P<name>...) for named
> capture, (?P=name) for a backreference within the regex, and \g<name> for a
> backreference within a replacement string. Personally, I feel this a
> mistake.
>
> Although Python was the first to implement named capture, other libraries
> seem to be standardizing around .NET's alternative syntax, which uses
> (?<name>...) or (?'name'...) for capture, \k<name> or \k'name' for a
> backreference within the regex, and ${name} for a backreference within a
> replacement string. Perl 5.10 has adopted .NET's syntax (although
> backrefereces within a replacement string use $+{name}) since "most people
> consider it to be nicer". Recent versions of PCRE have followed Perl's lead
> by supporting .NET's syntax for named capture as the preferred style.
I guess that counts as momentum...
> Here are the problems I see with the Python syntax:
>
> (?P<name>...)
> - What does the "P" stand for? "Python"? The character is unnecessary and
> unhelpful.
Or alternatively, it makes it possible to use other characters later
for other purposes.
> (?P=name)
> - Backreferences should not use parentheses since they are a single token
> and not a grouping.
>
> \g<name>
> - Is this a single token in ES4, or a string which in a string literal will
> have to be written as "\\g<name>"?
ES4 does not have this functionality. That may be an oversight and is
now logged as http://bugs.ecmascript.org/ticket/255.
IMO the most natural syntax for ES4 is something like $<name>, where
name is restricted to one of the names actually captured by the
RegExp: it's not an arbitrary property name or variable name.
> If it is the former, how will you be able
> to generate a replacement string using e.g. a textarea with user input, and
> if it's the latter, it seems less elegant than ${name} , which follows the
> '$ denotes a backreference' convention.
>
> One other question... can the results from named capture be used in a
> replacement closure function? I.e., will you be able to do something like
> str.replace(/(?P<name>)/,function(match){return match.name;}); ?
The captured substrings with names are available as properties on the
match result object, so that should work, yes.
--lars
More information about the Es4-discuss
mailing list