LR(1) grammar/parser and lookahead-restrictions

Dmitry Soshnikov dmitry.soshnikov at gmail.com
Tue Jan 24 06:46:51 UTC 2017


On Mon, Jan 23, 2017 at 5:24 PM, Waldemar Horwat <waldemar at google.com>
wrote:

> On 01/11/2017 10:28, Michael Dyck wrote:
>
>> In the past, it has been said (usually by Brendan Eich) that TC39 intends
>> that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so
>> much asking about the "1", but more about the "LR".)
>>
>> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
>> terminals]) fit into the LR approach. It seems like there are three
>> possibilities:
>>   - modify the grammar (before the parser is constructed),
>>   - modify the construction of the parser, and/or
>>   - modify the operation of the parser.
>> It's not clear to me that any of these (or any combination of these) will
>> work.
>>
>> Has anyone algorithmically generated an LR parser from the spec grammar?
>> If so, how did you you handle lookahead-restrictions?
>>
>
> I'm the source of the LR(1) condition.  I've been doing that ever since
> ECMAScript Edition 3, and in fact am using the LR parser to help design and
> debug the spec.  I have an implementation of the parser with a few
> extensions to the LR grammar, including support for parametrized
> productions, lookahead restrictions, and lexer-parser feedback used to
> disambiguate things such as what token / will start.  I validate various
> things such as making sure that there is no place where both an identifier
> and a regular expression can occur.
>
>
Thanks for the confirmation.

> lookahead restrictions

Yeah, all the lookahead restrictions, such as `lookahead ∉ {{, function}`
add a lot of extra "No<Lookahead>" productions, which makes the grammar
(and hence the parsing table) bigger unfortunately. Example is e.g.
`ExpressionNoIn`, `VariableDeclarationNoIn` from the spec, and similar for
`...NoLeftCurrly`, and `...NoFunction` to support those lookahead
restrictions, and allow `ExpressionStatement` to be parsed without
"reduce/reduce" conflicts.

That's why in the mentioned "sub-set" of the ECMAScript example language
[1] I chose not to use the same syntax for function expression and function
declaration: `fn foo() {}` for declaration, and just an arrow function for
function-expressions :) -- no ambiguity, no lookahead restrictions, hence
no larger grammar, hence a faster parser.

 > lexer-parser feedback

Yes, having a lexer/parser state solves it more elegantly. Also, if a
parser tool allows semantic action not only at the end (i.e on reduce), but
for each symbol on RHS (i.e. "S-attributed" definition, and not
"L-attributed"), handling the `lookahead ∉ {{, function}` could be even
easier: on matching the `{` from the statement position, one can track the
parser state that the block should be parsed, and not an object literal.
This probably can be solved with lexer state too -- to avoid again a bunch
of extra "No<lookahead>" productions. And such cases are simpler in manual
recursive decent.

[1] Example lang based on subset of ECMAScript grammar:
https://github.com/DmitrySoshnikov/syntax/blob/master/examples/lang.bnf

Dmitry



>     Waldemar
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170123/0937e6b9/attachment-0001.html>


More information about the es-discuss mailing list