LR(1) grammar/parser and lookahead-restrictions

Dmitry Soshnikov dmitry.soshnikov at
Wed Feb 15 05:31:22 UTC 2017

On Mon, Jan 23, 2017 at 10:46 PM, Dmitry Soshnikov <
dmitry.soshnikov at> wrote:

> On Mon, Jan 23, 2017 at 5:24 PM, Waldemar Horwat <waldemar at>
> wrote:
>> On 01/11/2017 10:28, Michael Dyck wrote:
>>> In the past, it has been said (usually by Brendan Eich) that TC39
>>> intends that the ECMAScript grammar be LR(1). Is that still the case? (I'm
>>> not so much asking about the "1", but more about the "LR".)
>>> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
>>> terminals]) fit into the LR approach. It seems like there are three
>>> possibilities:
>>>   - modify the grammar (before the parser is constructed),
>>>   - modify the construction of the parser, and/or
>>>   - modify the operation of the parser.
>>> It's not clear to me that any of these (or any combination of these)
>>> will work.
>>> Has anyone algorithmically generated an LR parser from the spec grammar?
>>> If so, how did you you handle lookahead-restrictions?
>> I'm the source of the LR(1) condition.  I've been doing that ever since
>> ECMAScript Edition 3, and in fact am using the LR parser to help design and
>> debug the spec.  I have an implementation of the parser with a few
>> extensions to the LR grammar, including support for parametrized
>> productions, lookahead restrictions, and lexer-parser feedback used to
>> disambiguate things such as what token / will start.  I validate various
>> things such as making sure that there is no place where both an identifier
>> and a regular expression can occur.
> Thanks for the confirmation.
> > lookahead restrictions
> Yeah, all the lookahead restrictions, such as `lookahead ∉ {{, function}`
> add a lot of extra "No<Lookahead>" productions, which makes the grammar
> (and hence the parsing table) bigger unfortunately. Example is e.g.
> `ExpressionNoIn`, `VariableDeclarationNoIn` from the spec, and similar for
> `...NoLeftCurrly`, and `...NoFunction` to support those lookahead
> restrictions, and allow `ExpressionStatement` to be parsed without
> "reduce/reduce" conflicts.
> That's why in the mentioned "sub-set" of the ECMAScript example language
> [1] I chose not to use the same syntax for function expression and function
> declaration: `fn foo() {}` for declaration, and just an arrow function for
> function-expressions :) -- no ambiguity, no lookahead restrictions, hence
> no larger grammar, hence a faster parser.
>  > lexer-parser feedback
> Yes, having a lexer/parser state solves it more elegantly.

FYI: I just built this as a proof of concept for ECMAScript's `{` and `}`,
and have parsed normally `{ }` in statement position as a `BlockStatement`,
and in the expression position as an `ObjectLiteral`. With special
"activation productions" one can change the lexer state, and yield
different token types for the same characters.

See the example here:

And some details in the docs:

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the es-discuss mailing list