LR(1) grammar/parser and lookahead-restrictions
Michael Dyck
jmdyck at ibiblio.org
Wed Feb 1 15:37:15 UTC 2017
On 17-01-31 02:17 PM, Dmitry Soshnikov wrote:
> The spec is a formal language definition (including for syntactic grammar).
(Well, the extent to which the spec is formal is debatable. The grammar is
the most formal part of it, but that's true of most programming language
definitions.)
> It doesn't talk about specifics of implementation much, nor it's supposed to
> do so (so demanding this from spec isn't appropriate).
Right (and I'm pretty sure I didn't make such a demand).
> It doesn't tell one to implement an LR(1),
Nor should it.
> neither it describes the
> techniques *how* one going to do so (even though as was confirmed in this
> thread it's LR(1)-parsable). It just uses some formal notation for this,
> e.g. `lookahead ∉ {{, function}`. Specific techniques *how exactly* you
> going to achieve this, is not the purpose of the specification.
Right.
> (but it's possible to clarify such techniques on discussions which we have
> now -- please note though, the techniques described above are not an
> "assumption", or a "guess", they are actual techniques used on practice to
> achieve this in LR(1)).
I'm not disputing the actualness of the techniques. I'm saying:
- technique #1 (duplicated productions): As stated, looks insufficient for
the ES grammar. Which may just mean that there's a more complicated version
of the approach that *is* sufficient for ES.
- technique #2 (cover grammar): Can you algorithmically (a) construct the
cover grammar and (b) recover the parse according to the original grammar?
- technique #3 (parse-time actions): In the LR automaton, can you
algorithmically distinguish action-avoided conflicts from 'true' conflicts?
> The spec in definitions is not consistent though. In one place it uses
> explicit *technique* for lookahead restriction via the "No<lookahead>"
> productions. In some cases it just uses formal description as `lookahead ∉
> {{, function}`, leaving a specific technique to achieve this up to
> implementations.
>
> In fact (for consistency), the rule from ES5:
>
> ```
> for ( ExpressionNoIn ; Expression ; Expression ) Statement
> ```
>
> could be described as:
>
> ```
> for ( `lookahead ∉ {in}` Expression ; Expression ; Expression ) Statement
> ```
>
> and would mean the same.
No, they wouldn't. ExpressionNoIn (or Expression[~In]) is a version of
Expression that doesn't ("openly") use the 'in' operator (at the
RelationalExpression level), i.e. it doesn't have the 'in' operator
somewhere in the *middle*, whereas
[lookahead ∉ {in}] Expression
prohibits an Expression that *starts* with the token 'in'. You can't express
the 'NoIn' restriction with any finite amount of lookahead, because a
RelationalExpression can be arbitrarily large.
> You may search for ES5 LR(1) construction (there are several over
> internets), they use the described above techniques for `lookahead ∉ {{,
> function}` (adding `NoCurlyBrace`, or `NoFunction` productions).
But the chances are low that they've been algorithmically generated from the
ES spec. And while it's conceivable I could figure out how to do it
algorithmically by examining examples of it done manually, I'm doubtful of
that too.
> That's also said, it could be rewritten using Cover grammar which adds
> though post-processing overhead -- that might be the reason why some of
> the `No<lookahead>` productions were removed from ES6 spec,
What removed productions are you thinking of? ES5's "NoIn" productions
became ES6+'s "[In]" parameter (which is basically the same thing with a
more compact notation). And ES5's "NoDash" productions are still in the
current draft.
-Michael
More information about the es-discuss
mailing list