LR(1) grammar/parser and lookahead-restrictions
Dmitry Soshnikov
dmitry.soshnikov at gmail.com
Sat Feb 4 06:09:03 UTC 2017
On Fri, Feb 3, 2017 at 2:32 PM, Waldemar Horwat <waldemar at google.com> wrote:
> On 02/02/2017 15:56, Dmitry Soshnikov wrote:
>
>> On Thu, Feb 2, 2017 at 3:23 PM, Waldemar Horwat <waldemar at google.com
>> <mailto:waldemar at google.com>> wrote:
>>
>> On 02/01/2017 10:25, Dmitry Soshnikov wrote:
>>
>> As mentioned, Cover grammar is usually the process of the grammar
>> design itself (as in ES6 spec itself). I'm not aware of automatic
>> transformations for this (if you find any please let me know).
>>
>>
>> Cover grammars are an ugly hack that we had to add when there was no
>> other practical choice. Avoid them as much as possible. They are only
>> used in situations where lookahead restrictions and parametrized grammar
>> rules do not work in any practical way.
>>
>>
>> Yeah, absolutely agreed. The reason why I used cover grammar in that
>> example to parse `{` as a `Block` vs. `ObjectLiteral`, and handle
>> `ExpressionStatement` is to make the resulting grammar short, and don't
>> introduce a bunch of `NoCurly`, etc extra productions for this. That's also
>> said, this ugly hack also forces you to do post-processing overhead --
>> either validation of the nodes, or even re-interpretation (rewriting) to
>> other nodes -- in my case I have to actually check that all nodes between
>> `{` and `}` are properties, or vice-versa, statements, based on the
>> expression/statement position.
>>
>
> Don't use a cover grammar to unify blocks with object literals. That's a
> really bad idea and you'll likely get it wrong. If the `{` starts a block,
> then if the next token is a keyword such as `if` then it's parsed as a
> keyword. If the `{` starts an object literal, then if the next token is a
> keyword then it's parsed as an identifiername. As we extend the language,
> the expansions of these can later diverge to the point where you won't know
> whether a `/` starts a regular expression or is a division symbol.
>
>
Ah, good catch. In fact, this can be fixed by allowing keywords as property
names, as in JS (and in fact, I have just fixed it after your comment --
see example code changes in this commit:
https://github.com/DmitrySoshnikov/syntax/commit/ccb3029cf7515bc078436c37831e027b50cb7fa4
-- there are no ambiguities, and it's still correctly parsed as an
`ObjectLiteral`, or `Block` in needed position).
Just to note, this example language is a smaller sub-set of ECMAScript
grammar, but not ES grammar itself, so, so far, Cover grammar works ;)
(still being a "dirty hack" since the language syntax was designed so,
allowing the same construct in different logically positions).
P.S.:
Another observation to note, using Cover grammar for this specific case
with `{` requires parsing at least in LALR(1) mode, and SLR(1) will not
work, eventually reaching to "reduce-reduce" conflict for when parsing e.g.
an empty block inside the empty `{ { } }`:
(output from the parsing states analysis tool)
```
- BlockStatement -> BlockOrObject • (kernel, reduce by production 13)
- ObjectLiteral -> BlockOrObject • (kernel, reduce by production 68)
```
LALR(1) uses parametrized lookahead set, and is free from this issue. It's
the one which is used the most on practice anyways (taking more time for
table calculation though)
Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170203/e2a1de31/attachment-0001.html>
More information about the es-discuss
mailing list