LR(1) grammar/parser and lookahead-restrictions

Dmitry Soshnikov dmitry.soshnikov at gmail.com
Mon Jan 30 23:20:57 UTC 2017


On Mon, Jan 30, 2017 at 2:22 PM, Michael Dyck <jmdyck at ibiblio.org> wrote:

> On 17-01-30 03:00 PM, Dmitry Soshnikov wrote:
>
>>
>> On Mon, Jan 30, 2017 at 9:26 AM, Michael Dyck <jmdyck at ibiblio.org
>> <mailto:jmdyck at ibiblio.org>> wrote:
>>
>>     On 17-01-30 12:15 AM, Dmitry Soshnikov wrote:
>>
>>
>>         As mentioned above, there are two ways to handle lookahead
>>         restrictions in
>>         LR-parser:
>>
>>         1. Using "No<Lookahead>" productions.
>>
>>     So, e.g., one could take the production
>>         IterationStatement : ...  [lookahead != 'let'] LHSExpr ...
>>     and transform it into
>>         IterationStatement : ... LHSExpr-that-does-not-start-with-let ...
>>     and then generate productions for the new nonterminal, e.g.
>>         LHSExpr-that-does-not-start-with-let :
>>             New-Expression-that-does-not-start-with-let
>>             Call-Expression-that-does-not-start-with-let
>>     etc, (eventually bottoming out when a RHS of the original grammar will
>>     either always or never start with 'let').
>>
>> Correct, and this is how ES spec itself handles all those "No<lookahead>"
>> duplicated productions.
>>
>
> If you're talking about the 'NoIn' productions in ES5, that seems only
> marginally relevant, since they don't address a lookahead problem (i.e., a
> problem that could have been solved via a lookahead-restriction).
>
>     The problem with this approach is that, in general, a
>>     lookahead-restriction is not a restriction on the nonterminal that
>>     immediately follows it in the production, it's a restriction on the
>> next
>>     few input elements, which might be generated by more than just that
>>     nonterminal.
>>
>>
>> Well, in this case it's not a "lookahead" anymore.
>>
>
> It isn't a "lookahead" in the resulting grammar, but it is in the
> original, and the two grammars should define the same language. What I'm
> saying is that it's incorrect to assume that the effect of the lookahead
> (in the original) can be confined to the nonterminal that immediately
> follows, which is what the suggested transformation does.


A "lookahead" is a terminal token. A non-terminal is not a "lookahead".

To reiterate, imaging you have:

```
• (x, y)
```

and

```
• (x, y) -> { return x + y; }
```

where `•` is a cursor position. Your lookaheads are "(", "x", "," "y", etc.
Usually 1 lookahead token is enough to route the parser, in this case "(".

What you're talking that you need not these lookaheads (at the beginning of
the cursor), but *after* the parsed non-terminal (i.e. after fully parsed
`(x, y)`), i.e. you're looking for "->", which goes at the end of the
non-terminal. This is not possible, since it's not a lookahead at that
specific cursor position. It will be a lookahead after you have parsed the
non-terminal corresponding to "(x, y)".

So you can't have two non-terminals for this, since it'll be a
"reduce/reduce" conflict, and you need a cover grammar here.



>
>
> You could have k=2,3,...N for lookahead in parser generators, but it's
>> not like parse the whole non-terminal, and then see a lookahead after it.
>> This "parse whole non-terminal till the end" might be parse the whole
>> program.
>>
>
> I don't really follow what you're suggesting there.


Like mentioned above:

```
• (x, y) -> { return x + y; }
```

In order to determine whether it's a grouping operator, or arrow function
params, it's not enough analyzing a "lookahead" being that the cursor
position. You need to actually fully parse this non-terminal, and advance
to this position:

```
(x, y) • -> { return x + y; }
```

and from there you may already analyze a real lookahead. However, the
problem is the "(x, y)" might the full program (imagine million of entries
on several GBs file source code -- what kind of "lookahead" can you analyze
here? :))



>
>
>
> In this case a cover grammar is simpler, and that's why it's used
>> starting since ES6 spec.
>>
>
> But note that none of the ES5 lookahead-restrictions was replaced with a
> cover grammar in ES6. They're all still lookahead-restrictions.


Sure, because it's normally possible to do with the "No<Lookahead>" doubled
productions. So this actually answers your initial question (it's a real
practice).


>
>
>
>>     I'm not saying this is insurmountable, but it does seem to get messy
>>     fairly quickly.
>>
>>     And it's unclear how this approach would handle a
>> lookahead-restriction
>>     at the end of a production.
>>
>>
>> Yes, it adds a bunch of duplicated productions to the grammar, which on
>> practice may increases number of parsing states simnifically. As a real
>> practical example: ES5 spec in LALR(1) mode would increase from ~350
>> states
>> to 470.
>>
>
> By "messy", I wasn't talking about the duplicated productions, but rather
> the process of generating them.
>
>
>         2. Another approach is to use Cover grammar.
>>
>>
>>     Is there a way to automatically (a) construct the cover grammar and
>> (b)
>>     reconstruct the parse tree as if parsed by the original grammar, or do
>>     you code it manually?
>>
>>
>> Usually it's a manual process of designing a grammar, and defining
>> post-processing static semantics.
>>
>
> Okay, I'm not really interested then. I'm looking for an algorithm (i.e.,
> a modification of the LR algorithm) that starts with a
> grammar-with-lookahead-restrictions and ends with an LR parser.


OK, yeah, at this point I think we already clarified that it is possible to
parse ES grammar, and handle it's lookahead restrictions with several
techniques described. If you're interested in generic LR parsing, feel free
to contact me (we can omit generic off-topic discussions on this list).


Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170130/f069c4c6/attachment.html>


More information about the es-discuss mailing list