LR(1) grammar/parser and lookahead-restrictions

Dmitry Soshnikov dmitry.soshnikov at gmail.com
Thu Feb 2 23:56:23 UTC 2017


On Thu, Feb 2, 2017 at 3:23 PM, Waldemar Horwat <waldemar at google.com> wrote:

> On 02/01/2017 10:25, Dmitry Soshnikov wrote:
>
>> As mentioned, Cover grammar is usually the process of the grammar design
>> itself (as in ES6 spec itself). I'm not aware of automatic transformations
>> for this (if you find any please let me know).
>>
>
> Cover grammars are an ugly hack that we had to add when there was no other
> practical choice.  Avoid them as much as possible.  They are only used in
> situations where lookahead restrictions and parametrized grammar rules do
> not work in any practical way.
>
>
Yeah, absolutely agreed. The reason why I used cover grammar in that
example to parse `{` as a `Block` vs. `ObjectLiteral`, and handle
`ExpressionStatement` is to make the resulting grammar short, and don't
introduce a bunch of `NoCurly`, etc extra productions for this. That's also
said, this ugly hack also forces you to do post-processing overhead --
either validation of the nodes, or even re-interpretation (rewriting) to
other nodes -- in my case I have to actually check that all nodes between
`{` and `}` are properties, or vice-versa, statements, based on the
expression/statement position.

Practically, a cover grammar still can be slower because of such
post-processing overhead (imaging you have a giant file with just object --
all those now are needed to be validated to contain property nodes). OTOH,
a trade off with lookahead restrictions is introducing ~1.5x more parsing
states.


> When designing the grammar, the preferences are:
>
> - Use standard LR(1) productions
> - Use parametrized productions
> - Use lookahead restrictions if parametrized productions would introduce
> too many parameters into rules
> - Use a cover grammar if the grammar can't be reasonably expressed in any
> other way.  They're a last resort with lots of problems.
>
>
Just to double-check, by "parametrized productions" you mean the
`No<lookahead>` extra productions?


> Lookahead restrictions fit very well into an LR(1) engine as long as
> they're limited to only one token, and that's what I've implemented in the
> validator.  You need to be very careful with them if looking more than one
> token ahead because lexing of the tokens can vary based on context.  For
> example, if the next few characters in front of the cursor are )/abc/i+,
> then what is the second token?  What is the third token?  It's actually
> context-dependent.
>
> The same problem is even worse for cover grammars.
>
>
Yeah, that's also true. Thanks again for the confirmation, that such a
validator even exists, it's good -- at least we know it's parsable in
principle. Those, that's said, on real practice now implementing a manual
recursive descent for ECMAScript is more approachable now.

Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170202/61ff3cc3/attachment-0001.html>


More information about the es-discuss mailing list