should calling RegExp constructor as function without arguments throw?

Lasse R.H. Nielsen atwork at infimum.dk
Wed Jan 14 14:01:51 PST 2009


On Wed, 14 Jan 2009 14:13:13 +0100, Hallvord R. M. Steen <hallvord at opera.com> wrote:

> Apologies if this has already been covered, I tried
> googling but found only tangentially related stuff about "/regexp/()"
> syntax.

There are a few parts of the regexp syntax that wouldn't mind a look-over.

My two primary pee-ve's are that look-aheads are Atoms, not Assertions,
and that back-references to captures occuring later in the source, are 
valid. 

The only difference between an Atom and an Assertion is that the former
can have a quantifier attached. There is absolutely no reason to put a
quantifier on a look-ahead, and look-aheads are zero-width matches just
like all assertions, so they would fit much better as assertions.
Changing the grammar to make look-aheads actual assertions wouldn't even
require implementations to change. It would just change quantified
look-aheads from being standard to being an extension, like so many
other things in regexps already are. (The feature was only added to 
JSC recently - I'm guessing nobody had needed it).

The problem with back-references is that the requirement prevents
a one-pass parser, because you need to scan the entire regexp to
know whether a decimal escape is valid. Well, actually it wouldn't 
be a problem if you didn't want to be compatible with all the 
current implementations that treat invalid decimal escapes as 
octal escapes - so you need to know whether a given decimal sequence
is a valid back-reference in order to parse it as octal if it isn't
valid.
At least IE6 actually limits the valid back-references to the
captures that were started previous to the back-reference in the
source. That's a reasonable approach from a parsing perspective
(I'd be happy if that was what was required), but really you only 
need to be able to reference captures that can be completed at the 
point where they occour, i.e., where both the start and end parentheses 
of the capture being referenced occur prior to the back-reference in 
the source.


/L 
-- 
Lasse R.H. Nielsen
Speaking only for myself ... if even that.
'Faith without judgement merely degrades the spirit divine'


More information about the Es-discuss mailing list