LR(1) grammar/parser and lookahead-restrictions
Dmitry Soshnikov
dmitry.soshnikov at gmail.com
Wed Feb 15 05:31:22 UTC 2017
On Mon, Jan 23, 2017 at 10:46 PM, Dmitry Soshnikov <
dmitry.soshnikov at gmail.com> wrote:
>
> On Mon, Jan 23, 2017 at 5:24 PM, Waldemar Horwat <waldemar at google.com>
> wrote:
>
>> On 01/11/2017 10:28, Michael Dyck wrote:
>>
>>> In the past, it has been said (usually by Brendan Eich) that TC39
>>> intends that the ECMAScript grammar be LR(1). Is that still the case? (I'm
>>> not so much asking about the "1", but more about the "LR".)
>>>
>>> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
>>> terminals]) fit into the LR approach. It seems like there are three
>>> possibilities:
>>> - modify the grammar (before the parser is constructed),
>>> - modify the construction of the parser, and/or
>>> - modify the operation of the parser.
>>> It's not clear to me that any of these (or any combination of these)
>>> will work.
>>>
>>> Has anyone algorithmically generated an LR parser from the spec grammar?
>>> If so, how did you you handle lookahead-restrictions?
>>>
>>
>> I'm the source of the LR(1) condition. I've been doing that ever since
>> ECMAScript Edition 3, and in fact am using the LR parser to help design and
>> debug the spec. I have an implementation of the parser with a few
>> extensions to the LR grammar, including support for parametrized
>> productions, lookahead restrictions, and lexer-parser feedback used to
>> disambiguate things such as what token / will start. I validate various
>> things such as making sure that there is no place where both an identifier
>> and a regular expression can occur.
>>
>>
> Thanks for the confirmation.
>
> > lookahead restrictions
>
> Yeah, all the lookahead restrictions, such as `lookahead ∉ {{, function}`
> add a lot of extra "No<Lookahead>" productions, which makes the grammar
> (and hence the parsing table) bigger unfortunately. Example is e.g.
> `ExpressionNoIn`, `VariableDeclarationNoIn` from the spec, and similar for
> `...NoLeftCurrly`, and `...NoFunction` to support those lookahead
> restrictions, and allow `ExpressionStatement` to be parsed without
> "reduce/reduce" conflicts.
>
> That's why in the mentioned "sub-set" of the ECMAScript example language
> [1] I chose not to use the same syntax for function expression and function
> declaration: `fn foo() {}` for declaration, and just an arrow function for
> function-expressions :) -- no ambiguity, no lookahead restrictions, hence
> no larger grammar, hence a faster parser.
>
> > lexer-parser feedback
>
> Yes, having a lexer/parser state solves it more elegantly.
>
FYI: I just built this as a proof of concept for ECMAScript's `{` and `}`,
and have parsed normally `{ }` in statement position as a `BlockStatement`,
and in the expression position as an `ObjectLiteral`. With special
"activation productions" one can change the lexer state, and yield
different token types for the same characters.
See the example here:
https://github.com/DmitrySoshnikov/syntax/blob/master/examples/parser-lexer-communication.g
And some details in the docs:
https://github.com/DmitrySoshnikov/syntax#access-tokenizer-from-parser-semantic-actions
Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170214/43fa084a/attachment.html>
More information about the es-discuss
mailing list