LR(1) grammar/parser and lookahead-restrictions

Michael Dyck jmdyck at ibiblio.org
Mon Jan 30 22:22:42 UTC 2017


On 17-01-30 03:00 PM, Dmitry Soshnikov wrote:
>
> On Mon, Jan 30, 2017 at 9:26 AM, Michael Dyck <jmdyck at ibiblio.org
> <mailto:jmdyck at ibiblio.org>> wrote:
>
>     On 17-01-30 12:15 AM, Dmitry Soshnikov wrote:
>
>
>         As mentioned above, there are two ways to handle lookahead
>         restrictions in
>         LR-parser:
>
>         1. Using "No<Lookahead>" productions.
>
>     So, e.g., one could take the production
>         IterationStatement : ...  [lookahead != 'let'] LHSExpr ...
>     and transform it into
>         IterationStatement : ... LHSExpr-that-does-not-start-with-let ...
>     and then generate productions for the new nonterminal, e.g.
>         LHSExpr-that-does-not-start-with-let :
>             New-Expression-that-does-not-start-with-let
>             Call-Expression-that-does-not-start-with-let
>     etc, (eventually bottoming out when a RHS of the original grammar will
>     either always or never start with 'let').
>
> Correct, and this is how ES spec itself handles all those "No<lookahead>"
> duplicated productions.

If you're talking about the 'NoIn' productions in ES5, that seems only 
marginally relevant, since they don't address a lookahead problem (i.e., a 
problem that could have been solved via a lookahead-restriction).

>     The problem with this approach is that, in general, a
>     lookahead-restriction is not a restriction on the nonterminal that
>     immediately follows it in the production, it's a restriction on the next
>     few input elements, which might be generated by more than just that
>     nonterminal.
>
>
> Well, in this case it's not a "lookahead" anymore.

It isn't a "lookahead" in the resulting grammar, but it is in the original, 
and the two grammars should define the same language. What I'm saying is 
that it's incorrect to assume that the effect of the lookahead (in the 
original) can be confined to the nonterminal that immediately follows, which 
is what the suggested transformation does.

> You could have k=2,3,...N for lookahead in parser generators, but it's
> not like parse the whole non-terminal, and then see a lookahead after it.
> This "parse whole non-terminal till the end" might be parse the whole
> program.

I don't really follow what you're suggesting there.


> In this case a cover grammar is simpler, and that's why it's used
> starting since ES6 spec.

But note that none of the ES5 lookahead-restrictions was replaced with a 
cover grammar in ES6. They're all still lookahead-restrictions.

>
>     I'm not saying this is insurmountable, but it does seem to get messy
>     fairly quickly.
>
>     And it's unclear how this approach would handle a lookahead-restriction
>     at the end of a production.
>
>
> Yes, it adds a bunch of duplicated productions to the grammar, which on
> practice may increases number of parsing states simnifically. As a real
> practical example: ES5 spec in LALR(1) mode would increase from ~350 states
> to 470.

By "messy", I wasn't talking about the duplicated productions, but rather 
the process of generating them.


>         2. Another approach is to use Cover grammar.
>
>
>     Is there a way to automatically (a) construct the cover grammar and (b)
>     reconstruct the parse tree as if parsed by the original grammar, or do
>     you code it manually?
>
>
> Usually it's a manual process of designing a grammar, and defining
> post-processing static semantics.

Okay, I'm not really interested then. I'm looking for an algorithm (i.e., a 
modification of the LR algorithm) that starts with a 
grammar-with-lookahead-restrictions and ends with an LR parser.


>         P.S.: there is actually option #3 -- to use "S-attributed" parser
>         actions,
>         it's when you may execute handling action not only at the end of
>         production
>         (i.e. not only on "reduce"), but after each element on RHS:
>
>
>     I'm doubtful this would help. I wouldn't expect the LR-parser-generator
>     to 'understand' the content of actions, so it can't use them to alter
>     the generation of the parser, and so the LR automaton will have lots of
>     conflicts (because it can't 'separate' the alternatives that the
>     lookahead-restrictions are there to separate).
>
>
> Oh, it'll work for sure, depending on how powerful a parser generator is.
> Yes, technically it'll be a "reduce-reduce" conflict in the table,

So when your generated parser has conflicts, is it easy to tell which you 
don't have to worry about and which you do?

> but parser will never enter this state based on the semantic action executed
> *before* entering some non-terminal (not after). In this "before" action you
> can communicated to lexer, and check the lookahead.

It seems to me that the parser *will* enter such a state, but always via 
only one of the conflicting 'paths' (the other(s) being prevented at 
parse-time by the semantic actions). So only one of the conflicting actions 
will be valid, but you've still got to figure out which one that is.

-Michael


More information about the es-discuss mailing list