Proposal for exact matching and matching at a position in RegExp

Steve L. steves_list at hotmail.com
Thu Mar 4 03:04:31 PST 2010


On March 04, 2010 11:40 AM, Andy Chu wrote:

>> Regarding "RegEx", I'm pretty certain Brendan was talking about a
>> hypothetical new library (name unimportant) that the JavaScript community
>> might create in the future. I don't think anyone has suggested adding a
>
> I worked on something like this:
> http://code.google.com/p/json-pattern/ .  It is more a rethinking of
> regular expressions than just fixing existing bugs.

Thanks for the link. I'm always interested in projects like this.

> My belief is that regular expressions are hobbled by their syntax.  If
> they didn't have such bad syntax (^ means either negation or the start
> of a string; you sometimes negate with ^ and sometimes negate with
> capitalization, the whole (? nightmare,etc. ), then people would write
> large, useful and fast regexes and no one would bat an eye.

Syntax is a part of it. But also, backtracking. Despite backtracking being a 
big part of what makes regexes so expressive and powerful (and therefore 
contributing to their popularity), truly understanding backtracking is 
complicated for many people at first and it's easy for backtracking to get 
out of hand or cause unexpected results if you're not careful. I think the 
effects of backtracking are at least as big a reason for some people's 
reservations about regexes as is the syntax.

People *do* write large, useful, fast (and relatively readable) regexes. 
ES's lack of /x hampers this, to a significant extent. As do some missing 
features like named capture and atomic groups/possessive quantifiers. The 
"large" part is kind of beside the point, though--it would often be 
appropriate to split a regex into smaller parts regardless of what their 
syntax was.

> Highlights:
>
> - You can capture an entire (recursive) JSON structure with named and
> repeated elements (a generalization of named capture).  JavaScript
> currently just allows you to capture individual numbered values.

Possibly of interest: http://xregexp.com/ That's my library where I play 
with some regex ideas and hack support for new syntax/flags into existing 
RegExps. It needs some cleanup and changes, and I wouldn't recommend 
actually using it in production applications, but it adds comprehensive 
named capture support to existing RegExps. There's also some shitty 
recursive matching support via a plugin.

> - There are extensible filters (pipes) for converting values.  You can
> capture \d+ to the number 3 rather than the string "3"; you can write
> a filter to convert "3-2-2009" to a Date() instance, etc.
> - Pattern reuse / composition (nicer than Perl's)
> - More readable and more consistent syntax (I wasn't completely happy
> with where I ended up, but I have some unimplemented improvments)

Again, thanks for the link. I'll try to look over it sometime.
 



More information about the es-discuss mailing list