LR(1) grammar/parser and lookahead-restrictions

Dmitry Soshnikov dmitry.soshnikov at gmail.com
Mon Jan 30 05:15:34 UTC 2017


On Tue, Jan 24, 2017 at 8:23 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:

> On 17-01-23 08:24 PM, Waldemar Horwat wrote:
>
>> On 01/11/2017 10:28, Michael Dyck wrote:
>>
>>>
>>> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
>>> terminals]) fit into the LR approach. It seems like there are three
>>> possibilities:
>>>   - modify the grammar (before the parser is constructed),
>>>   - modify the construction of the parser, and/or
>>>   - modify the operation of the parser.
>>> It's not clear to me that any of these (or any combination of these) will
>>> work.
>>>
>>> Has anyone algorithmically generated an LR parser from the spec grammar?
>>> If so, how did you you handle lookahead-restrictions?
>>>
>>
>> I have an implementation of the parser with a few extensions to the LR
>> grammar, including support for ... lookahead  restrictions, ...
>>
>
> Great! So how do you handle lookahead restrictions?
>
>
As mentioned above, there are two ways to handle lookahead restrictions in
LR-parser:

1. Using "No<Lookahead>" productions. This unfortunately may double number
of productions in some sub-grammar. E.g. to handle Block vs. ObjectLiteral
in ECMAScript, one needs to double all the expression productions to
correctly parse `ExpressionStatement`. As an example in the spec itself,
take a look e.g. at `RelationalExpression` and `RelationalExpressionNoIn`,
and how it repeats (doubles) all the productions but with "NoIn" suffix
https://es5.github.io/#x11.8

2. Another approach is to use Cover grammar. It's when you parse a node
with two possible ways, and enforce (reinterpret) needed sub-nodes in the
cover-grammar post-processor (after having parsed a generic node). E.g. I
chose this approach to parse generically `ObjectLiteral`, and `Block` to
handle `{` lookahead restriction for `ExpressionStatement`. This approach
allows not to double grammar (hence making parsing table smaller), however
adds some post-processing overhead.
https://github.com/DmitrySoshnikov/syntax/blob/master/examples/lang.bnf#L206-L222

P.S.: there is actually option #3 -- to use "S-attributed" parser actions,
it's when you may execute handling action not only at the end of production
(i.e. not only on "reduce"), but after each element on RHS:

```
statement
options
{
k = 1 ;
}
: { input.LA(1) == LBRACE }? block
| statementTail
;
statementTail
: variableStatement
| emptyStatement
| expressionStatement
        ...
```

This option is probably the best alternative.

Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170129/f89aadb7/attachment-0001.html>


More information about the es-discuss mailing list