Proposal for exact matching and matching at a position in RegExp
Steve L.
steves_list at hotmail.com
Thu Feb 11 22:24:34 PST 2010
Outside of es-discuss, Brendan Eich asked for my thoughts on the merits of
\G vs. /y (intrinsically and in light of backward compatibility). I sent the
following reply, which he thought would be useful to forward to the list....
I have no preference between /y and \G. When I first saw /y proposed for
ES4, I felt it needlessly reinvented the wheel given that \G had already
been implemented pretty widely. On the other hand, the fact that \G reaches
out of the search pattern to read a property of a regex or string feels a
bit too much like magic to me, and implementing it as a flag (/y) seems less
weird. An argument in favor of \G is that it's more versatile than /y since
it can be used anywhere in a regex pattern (e.g., at the start of an
alternation option), not just as the leading element.
Note that \G works a bit differently across implementations. In some cases
it matches the start position of the current match (PCRE, Ruby), and
elsewhere it matches the end position of the previous match (Perl, Java,
.NET). Of course, this distinction only matters after a zero-length match
(since that increments the start position of the next search).
Perl has extra functionality around \G that makes it more useful.
Specifically, the fact that the location associated with \G is an attribute
of target strings (pos()) means that multiple regexes with \G can match
against a string in turn and they'll each pick up where the others left off.
Combine this with Perl's /c modifier (which prevents failed matches from
resetting the \G location) and you can run multiple regexes with \G and /c
against a string and advance only when there's a match. Here's a crappy
example:
while ($html !~ /\G$/gc) {
if ($html =~ /\G[^<&]+/gc) {
...
} elsif ($html =~ /\G<(\w+)[^>]+>/gc) {
...
} elsif ($html =~ /\G&#?\w+;/gc) {
...
}
}
Sorry for the tangent, but I thought it might be helpful to describe how \G
is used elsewhere.
Steven Levithan
http://blog.stevenlevithan.com
--------------------------------------------------
From: "Steve L." <steves_list at hotmail.com>
Sent: Wednesday, February 10, 2010 10:46 AM
To: "Andy Chu" <andy at chubot.org>; "es-discuss" <es-discuss at mozilla.org>
Subject: Re: Proposal for exact matching and matching at a position in
RegExp
>
>>> http://andychu.net/ecmascript/RegExp-Enhancements-2.html
>>>
>>> Basically the proposal is to add parameters which can override the
>>> internal state of the RegExp.
>>
>> Does anyone have any comments on this?
>>
>> Can I put it in a place where it will be considered for the next
>> ECMAScript? The overall idea seems relatively uncontroversial since
>> it was already implemented by Mozilla (for the exact same reason). I
>> have proposed a specific API enhancement too.
>
> I do not believe it was implemented for "the exact same reason." It seems
> you are merely looking for a way to match exactly at a given character
> position, and you correctly note that /y is not an elegant solution for
> this problem. However, although /y can be used to solve this problem, my
> understanding is that it was designed to work similarly to the \G regex
> token from Perl/PCRE/Java/.NET/etc. while tying in nicely with the
> lastIndex property. An important feature of /y (and \G from other regex
> flavors) is that, with global regexes (compiled with /g), each successive
> match must start where the last match ended. This is a very useful feature
> for writing some types of simple parsers, etc. And in the process of
> smartly solving this problem, you get an inelegant solution to your
> problem as a side effect, free of charge.
>
> Steven Levithan
> http://blog.stevenlevithan.com
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
More information about the es-discuss
mailing list