On Thu, Feb 11, 2010 at 10:24 PM, Steve L. <steves_list at> wrote:
> Outside of es-discuss, Brendan Eich asked for my thoughts on the merits of
> \G vs. /y (intrinsically and in light of backward compatibility). I sent the
> following reply, which he thought would be useful to forward to the list....
> I have no preference between /y and \G. When I first saw /y proposed for
> ES4, I felt it needlessly reinvented the wheel given that \G had already
> been implemented pretty widely. On the other hand, the fact that \G reaches
> out of the search pattern to read a property of a regex or string feels a
> bit too much like magic to me, and implementing it as a flag (/y) seems less
> weird. An argument in favor of \G is that it's more versatile than /y since
> it can be used anywhere in a regex pattern (e.g., at the start of an
> alternation option), not just as the leading element.

Agree that \G breaks some logical barrier.  I like to have a mental
model of the implementation internals, and \G breaks that a bit.

If compatibility with Mozilla is not an issue, I actually prefer
Python's approach of .search() vs. .match().  It's not a part of the
regex; it's not a property of the regex; it's how you *apply* the
regex to a string.  Just like you can apply the same regex with
.split() or .exec() or .replace().  They're orthogonal issues in my

Though as mentioned, gracefully upgrading with ES3-5 is an issue, so I
could only think of .exec() and .execLeft() for a left-anchored match.

One thing I didn't bring up is that Python actually has an "endpos"
argument.  You do, 10, 20), and it will stop at
position 20.  I couldn't think of a real use case for this.  But
anyone can think of one, that might be a consideration and sway things
in favor of separate methods.


> Note that \G works a bit differently across implementations. In some cases
> it matches the start position of the current match (PCRE, Ruby), and
> elsewhere it matches the end position of the previous match (Perl, Java,
> .NET). Of course, this distinction only matters after a zero-length match
> (since that increments the start position of the next search).
> Perl has extra functionality around \G that makes it more useful.
> Specifically, the fact that the location associated with \G is an attribute
> of target strings (pos()) means that multiple regexes with \G can match
> against a string in turn and they'll each pick up where the others left off.
> Combine this with Perl's /c modifier (which prevents failed matches from
> resetting the \G location) and you can run multiple regexes with \G and /c
> against a string and advance only when there's a match. Here's a crappy
> example:
> while ($html !~ /\G$/gc) {
>   if ($html =~ /\G[^<&]+/gc) {
>       ...
>   } elsif ($html =~ /\G<(\w+)[^>]+>/gc) {
>       ...
>   } elsif ($html =~ /\G&#?\w+;/gc) {
>       ...
>   }
> }
> Sorry for the tangent, but I thought it might be helpful to describe how \G
> is used elsewhere.
>>>> Basically the proposal is to add parameters which can override the
>>>> internal state of the RegExp.
>>> Does anyone have any comments on this?
>>> Can I put it in a place where it will be considered for the next
>>> ECMAScript?  The overall idea seems relatively uncontroversial since
>>> it was already implemented by Mozilla (for the exact same reason).  I
>>> have proposed a specific API enhancement too.
>> I do not believe it was implemented for "the exact same reason." It seems
>> you are merely looking for a way to match exactly at a given character
>> position, and you correctly note that /y is not an elegant solution for
>> this problem. However, although /y can be used to solve this problem, my
>> understanding is that it was designed to work similarly to the \G regex
>> token from Perl/PCRE/Java/.NET/etc. while tying in nicely with the
>> lastIndex property. An important feature of /y (and \G from other regex
>> flavors) is that, with global regexes (compiled with /g), each successive
>> match must start where the last match ended. This is a very useful feature
>> for writing some types of simple parsers, etc.  And in the process of
>> smartly solving this problem, you get an inelegant solution to your
>> problem as a side effect, free of charge.
