LR(1) grammar/parser and lookahead-restrictions
jmdyck at ibiblio.org
Mon Jan 30 17:26:06 UTC 2017
On 17-01-30 12:15 AM, Dmitry Soshnikov wrote:
> As mentioned above, there are two ways to handle lookahead restrictions in
> 1. Using "No<Lookahead>" productions. This unfortunately may double number
> of productions in some sub-grammar. E.g. to handle Block vs. ObjectLiteral
> in ECMAScript, one needs to double all the expression productions to
> correctly parse `ExpressionStatement`. As an example in the spec itself,
> take a look e.g. at `RelationalExpression` and `RelationalExpressionNoIn`,
> and how it repeats (doubles) all the productions but with "NoIn" suffix
So, e.g., one could take the production
IterationStatement : ... [lookahead != 'let'] LHSExpr ...
and transform it into
IterationStatement : ... LHSExpr-that-does-not-start-with-let ...
and then generate productions for the new nonterminal, e.g.
etc, (eventually bottoming out when a RHS of the original grammar will
either always or never start with 'let').
The problem with this approach is that, in general, a lookahead-restriction
is not a restriction on the nonterminal that immediately follows it in the
production, it's a restriction on the next few input elements, which might
be generated by more than just that nonterminal. I'm not saying this is
insurmountable, but it does seem to get messy fairly quickly.
And it's unclear how this approach would handle a lookahead-restriction at
the end of a production.
> 2. Another approach is to use Cover grammar.
Is there a way to automatically (a) construct the cover grammar and (b)
reconstruct the parse tree as if parsed by the original grammar, or do you
code it manually?
> P.S.: there is actually option #3 -- to use "S-attributed" parser actions,
> it's when you may execute handling action not only at the end of production
> (i.e. not only on "reduce"), but after each element on RHS:
I'm doubtful this would help. I wouldn't expect the LR-parser-generator to
'understand' the content of actions, so it can't use them to alter the
generation of the parser, and so the LR automaton will have lots of
conflicts (because it can't 'separate' the alternatives that the
lookahead-restrictions are there to separate).
More information about the es-discuss