LR(1) grammar/parser and lookahead-restrictions

Michael Dyck jmdyck at ibiblio.org
Wed Feb 1 15:37:15 UTC 2017


On 17-01-31 02:17 PM, Dmitry Soshnikov wrote:
> The spec is a formal language definition (including for syntactic grammar).

(Well, the extent to which the spec is formal is debatable. The grammar is 
the most formal part of it, but that's true of most programming language 
definitions.)

> It doesn't talk about specifics of implementation much, nor it's supposed to
> do so (so demanding this from spec isn't appropriate).

Right (and I'm pretty sure I didn't make such a demand).

> It doesn't tell one to implement an LR(1),

Nor should it.

> neither it describes the
> techniques *how* one going to do so (even though as was confirmed in this
> thread it's LR(1)-parsable). It just uses some formal notation for this,
> e.g. `lookahead ∉ {{, function}`. Specific techniques *how exactly* you
> going to achieve this, is not the purpose of the specification.

Right.

> (but it's possible to clarify such techniques on discussions which we have
> now -- please note though, the techniques described above are not an
> "assumption", or a "guess", they are actual techniques used on practice to
> achieve this in LR(1)).

I'm not disputing the actualness of the techniques. I'm saying:

  - technique #1 (duplicated productions): As stated, looks insufficient for 
the ES grammar. Which may just mean that there's a more complicated version 
of the approach that *is* sufficient for ES.

  - technique #2 (cover grammar): Can you algorithmically (a) construct the 
cover grammar and (b) recover the parse according to the original grammar?

  - technique #3 (parse-time actions): In the LR automaton, can you 
algorithmically distinguish action-avoided conflicts from 'true' conflicts?


> The spec in definitions is not consistent though. In one place it uses
> explicit *technique* for lookahead restriction via the "No<lookahead>"
> productions. In some cases it just uses formal description as  `lookahead ∉
> {{, function}`, leaving a specific technique to achieve this up to
> implementations.
>
> In fact (for consistency), the rule from ES5:
>
> ```
> for ( ExpressionNoIn ; Expression ; Expression ) Statement
> ```
>
> could be described as:
>
> ```
> for ( `lookahead ∉ {in}` Expression ; Expression ; Expression ) Statement
> ```
>
> and would mean the same.

No, they wouldn't. ExpressionNoIn (or Expression[~In]) is a version of 
Expression that doesn't ("openly") use the 'in' operator (at the 
RelationalExpression level), i.e. it doesn't have the 'in' operator 
somewhere in the *middle*, whereas
    [lookahead ∉ {in}] Expression
prohibits an Expression that *starts* with the token 'in'. You can't express 
the 'NoIn' restriction with any finite amount of lookahead, because a 
RelationalExpression can be arbitrarily large.


> You may search for ES5 LR(1) construction (there are several over
> internets), they use the described above techniques for `lookahead ∉ {{,
> function}` (adding `NoCurlyBrace`, or `NoFunction` productions).

But the chances are low that they've been algorithmically generated from the 
ES spec. And while it's conceivable I could figure out how to do it 
algorithmically by examining examples of it done manually, I'm doubtful of 
that too.

> That's also said, it could be rewritten using Cover grammar which adds
> though post-processing overhead -- that might be the reason why some of
> the `No<lookahead>` productions were removed from ES6 spec,

What removed productions are you thinking of? ES5's "NoIn" productions 
became ES6+'s "[In]" parameter (which is basically the same thing with a 
more compact notation). And ES5's "NoDash" productions are still in the 
current draft.

-Michael



More information about the es-discuss mailing list