Directive prologue members, escapes, and retroactive syntax errors

Jeff Walden jwalden+es at MIT.EDU
Wed Oct 13 12:09:32 PDT 2010


The MS ES5 test suite has a test which boils down to this:

function error()
{
   "\145"; // ohai, octal escape
   "use strict"; // causes a RETROACTIVE SYNTAX ERROR
}

I've written a patch for SpiderMonkey to implement this according to the ES5 spec.  It's ugly in that I have to dig into the scanner to propagate outward whether we saw an octal escape, where we never had to return anything but a token before (type, position, one of a very few kinds of type-specific data).  I suspect my current approach can probably be improved, but even still, it doesn't seem like it should be necessary to add code specifically to handle this one edge case (one I would expect to see in a test suite, and nowhere else, ever) solely to report an error.

Don't get me wrong -- this is an excellent, devious, and diabolical test.  But with a very minimal modification to the definition of a Directive Prologue, it would not be necessary to specially track whether an octal escape has been seen in the Directive Prologue prior to a "use strict" directive.  The current definition is this:

> A Directive Prologue is the longest sequence of ExpressionStatement
> productions occurring as the initial SourceElement productions of a
> Program or FunctionBody and where each ExpressionStatement in the
> sequence consists entirely of a StringLiteral token followed a
> semicolon. The semicolon may appear explicitly or may be inserted
> by automatic semicolon insertion. A Directive Prologue may be an
> empty sequence.
>
> A Use Strict Directive is an ExpressionStatement in a Directive
> Prologue whose StringLiteral is either the exact character
> sequences "use strict" or 'use strict'. A Use Strict Directive may
> not contain an EscapeSequence or LineContinuation.

Suppose we added the further restriction that StringLiterals making up the Directive Prologue not contain an EscapeSequence or a LineContinuation (as is already the case for "use strict" directives).  Then we wouldn't have to do any token look-behind (problematic for a streaming parser that throws source away as it constructs a parse tree) or octal-escape-flagging to check for retroactive syntax errors when we encounter a non-leading "use strict" directive.  The modified text, then, would be something like this:

> A Directive Prologue is the longest sequence of ExpressionStatement
> productions occurring as the initial SourceElement productions of a
> Program or FunctionBody, where each ExpressionStatement in the
> sequence consists entirely of a StringLiteral token followed by a
> semicolon, and each such StringLiteral token does not contain  an
> EscapeSequence or LineContinuation. The semicolon may appear
> explicitly or may be inserted by automatic semicolon insertion. A
> Directive Prologue may be an empty sequence.
>
> A Use Strict Directive is an ExpressionStatement in a Directive
> Prologue whose StringLiteral is either the exact character
> sequences "use strict" or 'use strict'.

(Incidentally, I just noticed the spec text says "StringLiteral token followed a semicolon", accidentally omitting "by" -- so an erratum is necessary here even if we didn't adopt the above text, or some other change intended to smooth this rough spot.)

Yes, this would be an incompatible change to how strict mode works.  But as noted in past threads, no browsers have shipped with strict mode support yet, so no engines have implemented and shipped this restriction, and this seems like a pointlessly sharp corner case.

Thoughts from other parser hackers out there?

Jeff


More information about the es5-discuss mailing list