Proposal for exact matching and matching at a position in RegExp

Andy Chu andy at chubot.org
Tue Mar 2 23:17:13 PST 2010


> If we agree on /y, is the remainder of your proposal to simply add a pos
> (and possibly endPos) argument to the exec and test methods? I'd be all for
> that if the lastIndex property was also deprecated. I've argued for the same
> thing at http://blog.stevenlevithan.com/archives/fixing-javascript-regexp

Yes, that's all.

>> Well, the proposal is that pos, if passed, overrides lastIndex.  Is
>> there something wrong with that?
>>
>> The point is that the same RegExp() instance can be used in concurrent
>> contexts without the stomping on lastIndex.
>
> Agreed--that's how pos should work, if lastIndex is deprecated. But I think
> having two mechanisms (pos and lastIndex) for setting the search start
> position is a bad idea. If a pos argument exists, you'd expect pos to be 0
> when it's not specified. Having lastIndex around to sometimes screw up this
> expectation is confusing and will probably cause latent bugs. There are good
> reasons to deprecate lastIndex (you've mentioned one here, and another is
> that it only works with global regexes), so I think you should make this
> part of your proposal. Deprecating lastIndex should be a means toward the
> end of removing it altogether, after which pos would become more intuitive.

I guess it depends on how much ES6+ want to diverge from ES3/5.

I don't really think the lastIndex property doesn't need to be
deprecated for the pos argument.  I think we are talking about
literally a quarter of a line of code, e.g. in exec()/test():

function exec(s, pos) {
  var positionToStartFrom = pos || this.lastIndex || 0;
  this.lastIndex = ... call regex engine ...
}

There are APIs designed from the "ground up" this way.  The method
argument overrides internal state.

I can see your point about how it might be confusing if you omit
'pos'.  But I guess I just don't think it's that big a deal; it's a
judgement call.  People already know the old behavior, and there will
be code on the Web that uses until the end of time.  They just have to
know one new thing.

I like your idea on the blog post of cleaning up the global flag.  The
current /g behavior makes for a very un-orthogonal API.

>> Putting \G in because Perl
>> uses it is a bad reason, since following Perl to Perl 5 regexps would
>> be an atrocity : )  Unless someone is seriously arguing for \G, I
>> think it should be ruled out.  As you mentioned, it's more complicated
>> to implement since \G doesn't have to be at the beginning.
>
> After this recent back and forth (or as a consequence of it), it seems \G
> has little if any support on es-discuss (I agree on ruling it out).  But why
> do you say following Perl 5 regexes
> would be an atrocity? Perl/PCRE are still the leaders that other regex
> packages mostly follow, and IMO they constitute the state of the art for
> modern regex packages (although you
> could maybe point to Tcl as a counterargument). No one here is arguing for
> the more out-there Perl regex features like embedded code or backtracking
> control verbs, but there are plenty of useful Perl regex extensions that ES
> could benefit from, many of which have already been picked up by

All I'm saying is that adding \G to be consistent with Perl is not a
good reason.  And no one is really arguing for \G so it's a moot
point.

I agree that Perl has useful stuff that JavaScript doesn't have.  But
it's a slippery slope because even Perl 6 has admitted that Perl 5
regexes got out of hand.
(http://dev.perl.org/perl6/doc/design/apo/A05.html, "First, let me
enumerate some of the things that are wrong with current regex
culture.")

> Java/.NET/Oniguruma/etc.  Of course, such features should be considered on a
> case by case basis, but I have a bias towards regex library compatibility
> and not inventing new syntax/flags to address problems that have already
> been solved. Developers like their regexes to be portable, and a common
> gripe about regexes is that they're not more portable than they are.

Consistency with Perl should be considered but IMHO it's not a strong
consideration -- if Perl 5 has solved it one way, it doesn't mean it's
the best way.  It's worth looking at where Perl 6 breaks with Perl 5.
It's not possible to copy Perl 6 because of compatibility, but it is a
good sign that a Perl 5 solution was not satisfactory in the long
term.

Agree that we should take it on a case-by-case basis.  In this case it
sounds like /y, which has nothing to do with Perl, is good.  (If I
were to nitpick I would say /y for "sticky" is silly, I would call it
/a or /n for "anchored")

> I plan on writing up a list of features from other regex libraries (mostly
> Perl) that I think would be useful to consider for ES. It seems, though,
> that some people simply don't like the idea of making regular
> expressions--terse and supposedly-unreadable as they are--even more
> powerful. (Not referring to you; this is just something I see fairly often.)
> I do not like the idea of excluding useful features for the sake of supposed
> purity or saving people who don't understand backtracking from themselves.

It's a hard problem, because adding power and keeping compatibility
and keeping sane syntax are all at odds with each other.  The people
who want to keep regexes simple have a point.

I think focusing on use cases is the important thing.  Certainly the
tokenization use case has been run into multiple times, thus /y is
justified.

Personally I haven't "wanted" for anything in Python's regexps
(actually the only thing is to capture repeated groups, like
((?P<foo>\d+),)* could capture a comma separated list of integers, but
don't know if Perl even has that).  I know Perl has a ton more stuff,
but I am biased toward just writing procedural code combined with
regexes.  Certainly all the inline code stuff seems like a huge mess
to me.

I'll look at your other proposals in more detail.  As mentioned I
think the /g part is good.

Andy


More information about the es-discuss mailing list