Proposal for exact matching and matching at a position in RegExp
steves_list at hotmail.com
Mon Mar 1 05:00:47 PST 2010
On February 23, 2010 8:50 AM, Andy Chu wrote:
> So now that Erik Corry pointed out that /y as a compilation option
> matches the implementation, as mentioned I don't mind leaving it out
> of the method parameters. The "pos" is the one that still should be a
> method argument.
> /y works for me. I wasn't really proposing execLeft
> anymore, just considering its advantages.
If we agree on /y, is the remainder of your proposal to simply add a pos
(and possibly endPos) argument to the exec and test methods? I'd be all for
that if the lastIndex property was also deprecated. I've argued for the same
> The pos and endpos arguments would be synonymous with slicing the
> string, but more efficient. In fact the Python docs say explicitly
> that they're equivalent.
> I think so far everyone agrees with the 'pos' optimization. I'm not
> going to push for endpos but I thought I'd mention it.
pos is useful beyond optimization--e.g., it would let you easily iterate
over regex matches similar to how lastIndex can already be used. endPos is
not nearly as useful. I'm not opposed to it, but I think any pushback you'd
receive about adding marginally-useful arguments to existing methods would
be warranted (i.e., like you, I won't push for it).
> Well, the proposal is that pos, if passed, overrides lastIndex. Is
> there something wrong with that?
> The point is that the same RegExp() instance can be used in concurrent
> contexts without the stomping on lastIndex.
Agreed--that's how pos should work, if lastIndex is deprecated. But I think
having two mechanisms (pos and lastIndex) for setting the search start
position is a bad idea. If a pos argument exists, you'd expect pos to be 0
when it's not specified. Having lastIndex around to sometimes screw up this
expectation is confusing and will probably cause latent bugs. There are good
reasons to deprecate lastIndex (you've mentioned one here, and another is
that it only works with global regexes), so I think you should make this
part of your proposal. Deprecating lastIndex should be a means toward the
end of removing it altogether, after which pos would become more intuitive.
> I think JS has already admitted its influence by Python.
ES's RegExp has probably zero Python influence. Python's re has plenty of
> Putting \G in because Perl
> uses it is a bad reason, since following Perl to Perl 5 regexps would
> be an atrocity : ) Unless someone is seriously arguing for \G, I
> think it should be ruled out. As you mentioned, it's more complicated
> to implement since \G doesn't have to be at the beginning.
After this recent back and forth (or as a consequence of it), it seems \G
has little if any support on es-discuss (I agree on ruling it out). But why
do you say following Perl 5 regexes
would be an atrocity? Perl/PCRE are still the leaders that other regex
packages mostly follow, and IMO they constitute the state of the art for
modern regex packages (although you
could maybe point to Tcl as a counterargument). No one here is arguing for
the more out-there Perl regex features like embedded code or backtracking
control verbs, but there are plenty of useful Perl regex extensions that ES
could benefit from, many of which have already been picked up by
Java/.NET/Oniguruma/etc. Of course, such features should be considered on a
case by case basis, but I have a bias towards regex library compatibility
and not inventing new syntax/flags to address problems that have already
been solved. Developers like their regexes to be portable, and a common
gripe about regexes is that they're not more portable than they are.
I plan on writing up a list of features from other regex libraries (mostly
Perl) that I think would be useful to consider for ES. It seems, though,
that some people simply don't like the idea of making regular
expressions--terse and supposedly-unreadable as they are--even more
powerful. (Not referring to you; this is just something I see fairly often.)
I do not like the idea of excluding useful features for the sake of supposed
purity or saving people who don't understand backtracking from themselves.
More information about the es-discuss