LR(1) grammar/parser and lookahead-restrictions

Dmitry Soshnikov dmitry.soshnikov at gmail.com
Mon Jan 30 20:00:17 UTC 2017


On Mon, Jan 30, 2017 at 9:26 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:

> On 17-01-30 12:15 AM, Dmitry Soshnikov wrote:
>
>>
>> As mentioned above, there are two ways to handle lookahead restrictions in
>> LR-parser:
>>
>> 1. Using "No<Lookahead>" productions. This unfortunately may double number
>> of productions in some sub-grammar. E.g. to handle Block vs. ObjectLiteral
>> in ECMAScript, one needs to double all the expression productions to
>> correctly parse `ExpressionStatement`. As an example in the spec itself,
>> take a look e.g. at `RelationalExpression` and `RelationalExpressionNoIn`,
>> and how it repeats (doubles) all the productions but with "NoIn" suffix
>> https://es5.github.io/#x11.8
>>
>
> So, e.g., one could take the production
>     IterationStatement : ...  [lookahead != 'let'] LHSExpr ...
> and transform it into
>     IterationStatement : ... LHSExpr-that-does-not-start-with-let ...
> and then generate productions for the new nonterminal, e.g.
>     LHSExpr-that-does-not-start-with-let :
>         New-Expression-that-does-not-start-with-let
>         Call-Expression-that-does-not-start-with-let
> etc, (eventually bottoming out when a RHS of the original grammar will
> either always or never start with 'let').
>
>

Correct, and this is how ES spec itself handles all those "No<lookahead>"
duplicated productions.



> The problem with this approach is that, in general, a
> lookahead-restriction is not a restriction on the nonterminal that
> immediately follows it in the production, it's a restriction on the next
> few input elements, which might be generated by more than just that
> nonterminal.


Well, in this case it's not a "lookahead" anymore. You could have
k=2,3,...N for lookahead in parser generators, but it's not like parse the
whole non-terminal, and then see a lookahead after it. This "parse whole
non-terminal till the end" might be parse the whole program. In this case a
cover grammar is simpler, and that's why it's used starting since ES6 spec.
Take a look e.g. at `CoverParenthesizedExpressionAndArrowParameterList`
from ES2015.

A simplified example:
https://github.com/DmitrySoshnikov/syntax/blob/master/examples/lang.bnf#L514-L524

By just looking at:

```
(x, y)
```

you can't tell whether it's a grouping operator (`SequenceExpression`
node), or a beginning of a lambda function (i.e. its parameters node). So
in the generic `ParenthisizedExpression` production, you reinterpret it
either to lambda parameters, or keep just a grouping operator, if there is
no `LambdaTail`. Here it's a function of course, though for this you have
to parse `(x, y)` not as `LambdaParameters`, and not as
`SequenceExpression`, but as a generic `ParenthisizedExpression`, and using
Cover grammar:

```
(x, y) -> { return x + y; }
```



> I'm not saying this is insurmountable, but it does seem to get messy
> fairly quickly.
>
> And it's unclear how this approach would handle a lookahead-restriction at
> the end of a production.


Yes, it adds a bunch of duplicated productions to the grammar, which on
practice may increases number of parsing states simnifically. As a real
practical example: ES5 spec in LALR(1) mode would increase from ~350 states
to 470.


>
>
>
> 2. Another approach is to use Cover grammar.
>>
>
> Is there a way to automatically (a) construct the cover grammar and (b)
> reconstruct the parse tree as if parsed by the original grammar, or do you
> code it manually?
>
>
Usually it's a manual process of designing a grammar, and defining
post-processing static semantics. Take a look again in the example language
I gave above, it have couple of case of cover grammar.


>
> P.S.: there is actually option #3 -- to use "S-attributed" parser actions,
>> it's when you may execute handling action not only at the end of
>> production
>> (i.e. not only on "reduce"), but after each element on RHS:
>>
>
> I'm doubtful this would help. I wouldn't expect the LR-parser-generator to
> 'understand' the content of actions, so it can't use them to alter the
> generation of the parser, and so the LR automaton will have lots of
> conflicts (because it can't 'separate' the alternatives that the
> lookahead-restrictions are there to separate).
>
>
Oh, it'll work for sure, depending on how powerful a parser generator is.
Yes, technically it'll be a "reduce-reduce" conflict in the table, but
parser will never enter this state based on the semantic action executed
*before* entering some non-terminal (not after). In this "before" action
you can communicated to lexer, and check the lookahead.

Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170130/ed464968/attachment.html>


More information about the es-discuss mailing list