LR(1) grammar/parser and lookahead-restrictions

Michael Dyck jmdyck at ibiblio.org
Tue Jan 31 15:56:34 UTC 2017


On 17-01-30 06:20 PM, Dmitry Soshnikov wrote:
> On Mon, Jan 30, 2017 at 2:22 PM, Michael Dyck <jmdyck at ibiblio.org
> <mailto:jmdyck at ibiblio.org>> wrote:
>
>
>                 1. Using "No<Lookahead>" productions.
>
>             The problem with this approach is that, in general, a
>             lookahead-restriction is not a restriction on the nonterminal that
>             immediately follows it in the production, it's a restriction on
>             the next few input elements, which might be generated by more
>             than just that nonterminal.
>
>         Well, in this case it's not a "lookahead" anymore.
>
>     It isn't a "lookahead" in the resulting grammar, but it is in the
>     original, and the two grammars should define the same language. What I'm
>     saying is that it's incorrect to assume that the effect of the lookahead
>     (in the original) can be confined to the nonterminal that immediately
>     follows, which is what the suggested transformation does.
>
>
> A "lookahead" is a terminal token. A non-terminal is not a "lookahead".

Okay, so then in my previous statement, read "lookahead" as 
"lookahead-restriction". (But I did say "lookahead-restriction" in the 
statement before that.)


> To reiterate, imaging you have:
>
> ```
> • (x, y)
> ```
>
> and
>
> ```
> • (x, y) -> { return x + y; }
> ```
>
> where `•` is a cursor position. Your lookaheads are "(", "x", "," "y", etc.
> Usually 1 lookahead token is enough to route the parser, in this case "(".
>
> What you're talking that you need not these lookaheads (at the beginning of
> the cursor), but *after* the parsed non-terminal (i.e. after fully parsed
> `(x, y)`), i.e. you're looking for "->", which goes at the end of the
> non-terminal.

I'm not saying I don't need the lookahead tokens right after the cursor. But 
I am saying that the lookahead tokens that the hypothetical parser needs to 
examine (in order to enforce a lookahead-restriction), even if it's only 1 
or 2, might extend past the 'next' nonterminal.

But that's an awkward way to express my objection.

To recap, I'm talking about the suggestion to replace:
     [lookahead != <some sequence of terminals>] Nonterminal
in a production in the original grammar with
     Nonterminal-that-does-not-start-with-that-sequence
(and a bunch of new productions) in the transformed grammar, and whether 
that's a valid transformation.

I'm saying it isn't valid when the Nonterminal can derive phrases with fewer 
terminals than the prohibited lookahead-sequence. (E.g., if Nonterminal is 
nullable, or if it can derive a phrase of length 1 when the prohibited 
lookahead-sequence is length 2.)

> This is not possible, since it's not a lookahead at that
> specific cursor position. It will be a lookahead after you have parsed the
> non-terminal corresponding to "(x, y)".
>
> So you can't have two non-terminals for this, since it'll be a
> "reduce/reduce" conflict, and you need a cover grammar here.

Well, I'm saying you'd run into problems before you even tried to construct 
the LR automaton, but I think you're agreeing with me that this approach is 
not generally applicable?


>         In this case a cover grammar is simpler, and that's why it's used
>         starting since ES6 spec.
>
>     But note that none of the ES5 lookahead-restrictions was replaced with a
>     cover grammar in ES6. They're all still lookahead-restrictions.
>
> Sure, because it's normally possible to do with the "No<Lookahead>" doubled
> productions.

But they weren't replaced with "No" productions (or grammatical parameters) 
either! All the lookahead-restrictions in ES5 are still 
lookahead-restrictions in the current draft.

> So this actually answers your initial question (it's a real practice).

No, I don't think it does.

----------------------------

>                 2. Another approach is to use Cover grammar.
>
>         Usually it's a manual process of designing a grammar, and defining
>         post-processing static semantics.
>
>     Okay, I'm not really interested then. I'm looking for an algorithm
>     (i.e., a modification of the LR algorithm) that starts with a
>     grammar-with-lookahead-restrictions and ends with an LR parser.
>
> OK, yeah, at this point I think we already clarified that it is possible to
> parse ES grammar,  and handle it's lookahead restrictions with several
> techniques described.

Well, ES implementations exist, so clearly there are techniques for handling 
those aspects of the ES grammar. But that doesn't mean that any of them are 
parsing *according to* the ES grammar. E.g., it's unlikely that they're 
finding exactly the parse tree indicated by the ES grammar. (Nor is there 
any requirement that they do so.)

----

By way of contrast, consider "opt" subscripts and grammatical parameters. If 
you look up the LR parser-construction algorithm, it doesn't talk about how 
to handle such things, but the spec says how to transform them into plain 
context-free productions, which the LR algorithm can handle. So it's an easy 
shorthand to say that the ES grammar is LR(1) iff the 'expanded' grammar is 
LR(1).

But the spec doesn't give that kind of CFG-equivalence for 
lookahead-restrictions. Until there's agreement about what they mean in a 
CFG or LR sense, it isn't really meaningful to ask if the ES grammar is LR(1).

-Michael



More information about the es-discuss mailing list