Proposal for exact matching and matching at a position in RegExp

Steve L. steves_list at hotmail.com
Mon Mar 1 05:00:47 PST 2010


On February 23, 2010 8:50 AM, Andy Chu wrote:

> So now that Erik Corry pointed out that /y as a compilation option
> matches the implementation, as mentioned I don't mind leaving it out
> of the method parameters.  The "pos" is the one that still should be a
> method argument.
> [...]
> /y works for me.  I wasn't really proposing execLeft
> anymore, just considering its advantages.

If we agree on /y, is the remainder of your proposal to simply add a pos 
(and possibly endPos) argument to the exec and test methods? I'd be all for 
that if the lastIndex property was also deprecated. I've argued for the same 
thing at http://blog.stevenlevithan.com/archives/fixing-javascript-regexp

> The pos and endpos arguments would be synonymous with slicing the
> string, but more efficient. In fact the Python docs say explicitly
> that they're equivalent.
>
> I think so far everyone agrees with the 'pos' optimization.  I'm not
> going to push for endpos but I thought I'd mention it.

pos is useful beyond optimization--e.g., it would let you easily iterate 
over regex matches similar to how lastIndex can already be used. endPos is 
not nearly as useful. I'm not opposed to it, but I think any pushback you'd 
receive about adding marginally-useful arguments to existing methods would 
be warranted (i.e., like you, I won't push for it).

> Well, the proposal is that pos, if passed, overrides lastIndex.  Is
> there something wrong with that?
>
> The point is that the same RegExp() instance can be used in concurrent
> contexts without the stomping on lastIndex.

Agreed--that's how pos should work, if lastIndex is deprecated. But I think 
having two mechanisms (pos and lastIndex) for setting the search start 
position is a bad idea. If a pos argument exists, you'd expect pos to be 0 
when it's not specified. Having lastIndex around to sometimes screw up this 
expectation is confusing and will probably cause latent bugs. There are good 
reasons to deprecate lastIndex (you've mentioned one here, and another is 
that it only works with global regexes), so I think you should make this
part of your proposal. Deprecating lastIndex should be a means toward the 
end of removing it altogether, after which pos would become more intuitive.

> I think JS has already admitted its influence by Python.

ES's RegExp has probably zero Python influence. Python's re has plenty of 
Perl influence.

> Putting \G in because Perl
> uses it is a bad reason, since following Perl to Perl 5 regexps would
> be an atrocity : )  Unless someone is seriously arguing for \G, I
> think it should be ruled out.  As you mentioned, it's more complicated
> to implement since \G doesn't have to be at the beginning.

After this recent back and forth (or as a consequence of it), it seems \G 
has little if any support on es-discuss (I agree on ruling it out).  But why 
do you say following Perl 5 regexes
would be an atrocity? Perl/PCRE are still the leaders that other regex 
packages mostly follow, and IMO they constitute the state of the art for 
modern regex packages (although you
could maybe point to Tcl as a counterargument). No one here is arguing for 
the more out-there Perl regex features like embedded code or backtracking 
control verbs, but there are plenty of useful Perl regex extensions that ES 
could benefit from, many of which have already been picked up by 
Java/.NET/Oniguruma/etc.  Of course, such features should be considered on a 
case by case basis, but I have a bias towards regex library compatibility 
and not inventing new syntax/flags to address problems that have already 
been solved. Developers like their regexes to be portable, and a common 
gripe about regexes is that they're not more portable than they are.

I plan on writing up a list of features from other regex libraries (mostly 
Perl) that I think would be useful to consider for ES. It seems, though, 
that some people simply don't like the idea of making regular 
expressions--terse and supposedly-unreadable as they are--even more 
powerful. (Not referring to you; this is just something I see fairly often.) 
I do not like the idea of excluding useful features for the sake of supposed 
purity or saving people who don't understand backtracking from themselves.

--Steven Levithan
 



More information about the es-discuss mailing list