Proposal for exact matching and matching at a position in RegExp
Andy Chu
andy at chubot.org
Fri Feb 12 09:23:19 PST 2010
On Thu, Feb 11, 2010 at 10:24 PM, Steve L. <steves_list at hotmail.com> wrote:
> Outside of es-discuss, Brendan Eich asked for my thoughts on the merits of
> \G vs. /y (intrinsically and in light of backward compatibility). I sent the
> following reply, which he thought would be useful to forward to the list....
>
> I have no preference between /y and \G. When I first saw /y proposed for
> ES4, I felt it needlessly reinvented the wheel given that \G had already
> been implemented pretty widely. On the other hand, the fact that \G reaches
> out of the search pattern to read a property of a regex or string feels a
> bit too much like magic to me, and implementing it as a flag (/y) seems less
> weird. An argument in favor of \G is that it's more versatile than /y since
> it can be used anywhere in a regex pattern (e.g., at the start of an
> alternation option), not just as the leading element.
Agree that \G breaks some logical barrier. I like to have a mental
model of the implementation internals, and \G breaks that a bit.
If compatibility with Mozilla is not an issue, I actually prefer
Python's approach of .search() vs. .match(). It's not a part of the
regex; it's not a property of the regex; it's how you *apply* the
regex to a string. Just like you can apply the same regex with
.split() or .exec() or .replace(). They're orthogonal issues in my
mind.
Though as mentioned, gracefully upgrading with ES3-5 is an issue, so I
could only think of .exec() and .execLeft() for a left-anchored match.
One thing I didn't bring up is that Python actually has an "endpos"
argument. You do regex.search(s, 10, 20), and it will stop at
position 20. I couldn't think of a real use case for this. But
anyone can think of one, that might be a consideration and sway things
in favor of separate methods.
Andy
>
> Note that \G works a bit differently across implementations. In some cases
> it matches the start position of the current match (PCRE, Ruby), and
> elsewhere it matches the end position of the previous match (Perl, Java,
> .NET). Of course, this distinction only matters after a zero-length match
> (since that increments the start position of the next search).
>
> Perl has extra functionality around \G that makes it more useful.
> Specifically, the fact that the location associated with \G is an attribute
> of target strings (pos()) means that multiple regexes with \G can match
> against a string in turn and they'll each pick up where the others left off.
> Combine this with Perl's /c modifier (which prevents failed matches from
> resetting the \G location) and you can run multiple regexes with \G and /c
> against a string and advance only when there's a match. Here's a crappy
> example:
>
> while ($html !~ /\G$/gc) {
> if ($html =~ /\G[^<&]+/gc) {
> ...
> } elsif ($html =~ /\G<(\w+)[^>]+>/gc) {
> ...
> } elsif ($html =~ /\G&#?\w+;/gc) {
> ...
> }
> }
>
> Sorry for the tangent, but I thought it might be helpful to describe how \G
> is used elsewhere.
>
> Steven Levithan
> http://blog.stevenlevithan.com
>
> --------------------------------------------------
> From: "Steve L." <steves_list at hotmail.com>
> Sent: Wednesday, February 10, 2010 10:46 AM
> To: "Andy Chu" <andy at chubot.org>; "es-discuss" <es-discuss at mozilla.org>
> Subject: Re: Proposal for exact matching and matching at a position in
> RegExp
>
>>
>>>> http://andychu.net/ecmascript/RegExp-Enhancements-2.html
>>>>
>>>> Basically the proposal is to add parameters which can override the
>>>> internal state of the RegExp.
>>>
>>> Does anyone have any comments on this?
>>>
>>> Can I put it in a place where it will be considered for the next
>>> ECMAScript? The overall idea seems relatively uncontroversial since
>>> it was already implemented by Mozilla (for the exact same reason). I
>>> have proposed a specific API enhancement too.
>>
>> I do not believe it was implemented for "the exact same reason." It seems
>> you are merely looking for a way to match exactly at a given character
>> position, and you correctly note that /y is not an elegant solution for
>> this problem. However, although /y can be used to solve this problem, my
>> understanding is that it was designed to work similarly to the \G regex
>> token from Perl/PCRE/Java/.NET/etc. while tying in nicely with the
>> lastIndex property. An important feature of /y (and \G from other regex
>> flavors) is that, with global regexes (compiled with /g), each successive
>> match must start where the last match ended. This is a very useful feature
>> for writing some types of simple parsers, etc. And in the process of
>> smartly solving this problem, you get an inelegant solution to your
>> problem as a side effect, free of charge.
>>
>> Steven Levithan
>> http://blog.stevenlevithan.com
>>
>>
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>
More information about the es-discuss
mailing list