LR(1) grammar/parser and lookahead-restrictions

Dmitry Soshnikov dmitry.soshnikov at gmail.com
Tue Jan 31 19:17:18 UTC 2017


The spec is a formal language definition (including for syntactic grammar).
It doesn't talk about specifics of implementation much, nor it's supposed
to do so (so demanding this from spec isn't appropriate).

It doesn't tell one to implement an LR(1), neither it describes the
techniques *how* one going to do so (even though as was confirmed in this
thread it's LR(1)-parsable). It just uses some formal notation for this,
e.g. `lookahead ∉ {{, function}`. Specific techniques *how exactly* you
going to achieve this, is not the purpose of the specification.

(but it's possible to clarify such techniques on discussions which we have
now -- please note though, the techniques described above are not an
"assumption", or a "guess", they are actual techniques used on practice to
achieve this in LR(1)).

The spec in definitions is not consistent though. In one place it uses
explicit *technique* for lookahead restriction via the "No<lookahead>"
productions. In some cases it just uses formal description as  `lookahead ∉
{{, function}`, leaving a specific technique to achieve this up to
implementations.

In fact (for consistency), the rule from ES5:

```
for ( ExpressionNoIn ; Expression ; Expression ) Statement
```

could be described as:

```
for ( `lookahead ∉ {in}` Expression ; Expression ; Expression ) Statement
```

and would mean the same.

You may search for ES5 LR(1) construction (there are several over
internets), they use the described above techniques for `lookahead ∉ {{,
function}` (adding `NoCurlyBrace`, or `NoFunction` productions). That's
also said, it could be rewritten using Cover grammar which adds though
post-processing overhead -- that might be the reason why some of the
`No<lookahead>` productions were removed from ES6 spec, but `lookahead ∉
{{, function}` is still there. More complex cases like Arrow functions
params is still implemented as a Cover grammar.

Dmitry


On Tue, Jan 31, 2017 at 7:56 AM, Michael Dyck <jmdyck at ibiblio.org> wrote:

> On 17-01-30 06:20 PM, Dmitry Soshnikov wrote:
>
>> On Mon, Jan 30, 2017 at 2:22 PM, Michael Dyck <jmdyck at ibiblio.org
>> <mailto:jmdyck at ibiblio.org>> wrote:
>>
>>
>>                 1. Using "No<Lookahead>" productions.
>>
>>             The problem with this approach is that, in general, a
>>             lookahead-restriction is not a restriction on the nonterminal
>> that
>>             immediately follows it in the production, it's a restriction
>> on
>>             the next few input elements, which might be generated by more
>>             than just that nonterminal.
>>
>>         Well, in this case it's not a "lookahead" anymore.
>>
>>     It isn't a "lookahead" in the resulting grammar, but it is in the
>>     original, and the two grammars should define the same language. What
>> I'm
>>     saying is that it's incorrect to assume that the effect of the
>> lookahead
>>     (in the original) can be confined to the nonterminal that immediately
>>     follows, which is what the suggested transformation does.
>>
>>
>> A "lookahead" is a terminal token. A non-terminal is not a "lookahead".
>>
>
> Okay, so then in my previous statement, read "lookahead" as
> "lookahead-restriction". (But I did say "lookahead-restriction" in the
> statement before that.)
>
>
> To reiterate, imaging you have:
>>
>> ```
>> • (x, y)
>> ```
>>
>> and
>>
>> ```
>> • (x, y) -> { return x + y; }
>> ```
>>
>> where `•` is a cursor position. Your lookaheads are "(", "x", "," "y",
>> etc.
>> Usually 1 lookahead token is enough to route the parser, in this case "(".
>>
>> What you're talking that you need not these lookaheads (at the beginning
>> of
>> the cursor), but *after* the parsed non-terminal (i.e. after fully parsed
>> `(x, y)`), i.e. you're looking for "->", which goes at the end of the
>> non-terminal.
>>
>
> I'm not saying I don't need the lookahead tokens right after the cursor.
> But I am saying that the lookahead tokens that the hypothetical parser
> needs to examine (in order to enforce a lookahead-restriction), even if
> it's only 1 or 2, might extend past the 'next' nonterminal.
>
> But that's an awkward way to express my objection.
>
> To recap, I'm talking about the suggestion to replace:
>     [lookahead != <some sequence of terminals>] Nonterminal
> in a production in the original grammar with
>     Nonterminal-that-does-not-start-with-that-sequence
> (and a bunch of new productions) in the transformed grammar, and whether
> that's a valid transformation.
>
> I'm saying it isn't valid when the Nonterminal can derive phrases with
> fewer terminals than the prohibited lookahead-sequence. (E.g., if
> Nonterminal is nullable, or if it can derive a phrase of length 1 when the
> prohibited lookahead-sequence is length 2.)
>
> This is not possible, since it's not a lookahead at that
>> specific cursor position. It will be a lookahead after you have parsed the
>> non-terminal corresponding to "(x, y)".
>>
>> So you can't have two non-terminals for this, since it'll be a
>> "reduce/reduce" conflict, and you need a cover grammar here.
>>
>
> Well, I'm saying you'd run into problems before you even tried to
> construct the LR automaton, but I think you're agreeing with me that this
> approach is not generally applicable?
>
>
>         In this case a cover grammar is simpler, and that's why it's used
>>         starting since ES6 spec.
>>
>>     But note that none of the ES5 lookahead-restrictions was replaced
>> with a
>>     cover grammar in ES6. They're all still lookahead-restrictions.
>>
>> Sure, because it's normally possible to do with the "No<Lookahead>"
>> doubled
>> productions.
>>
>
> But they weren't replaced with "No" productions (or grammatical
> parameters) either! All the lookahead-restrictions in ES5 are still
> lookahead-restrictions in the current draft.
>
> So this actually answers your initial question (it's a real practice).
>>
>
> No, I don't think it does.
>
> ----------------------------
>
>                 2. Another approach is to use Cover grammar.
>>
>>         Usually it's a manual process of designing a grammar, and defining
>>         post-processing static semantics.
>>
>>     Okay, I'm not really interested then. I'm looking for an algorithm
>>     (i.e., a modification of the LR algorithm) that starts with a
>>     grammar-with-lookahead-restrictions and ends with an LR parser.
>>
>> OK, yeah, at this point I think we already clarified that it is possible
>> to
>> parse ES grammar,  and handle it's lookahead restrictions with several
>> techniques described.
>>
>
> Well, ES implementations exist, so clearly there are techniques for
> handling those aspects of the ES grammar. But that doesn't mean that any of
> them are parsing *according to* the ES grammar. E.g., it's unlikely that
> they're finding exactly the parse tree indicated by the ES grammar. (Nor is
> there any requirement that they do so.)
>
> ----
>
> By way of contrast, consider "opt" subscripts and grammatical parameters.
> If you look up the LR parser-construction algorithm, it doesn't talk about
> how to handle such things, but the spec says how to transform them into
> plain context-free productions, which the LR algorithm can handle. So it's
> an easy shorthand to say that the ES grammar is LR(1) iff the 'expanded'
> grammar is LR(1).
>
> But the spec doesn't give that kind of CFG-equivalence for
> lookahead-restrictions. Until there's agreement about what they mean in a
> CFG or LR sense, it isn't really meaningful to ask if the ES grammar is
> LR(1).
>
> -Michael
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170131/66e42705/attachment-0001.html>


More information about the es-discuss mailing list