LR(1) grammar/parser and lookahead-restrictions

Waldemar Horwat waldemar at google.com
Fri Feb 3 22:32:27 UTC 2017


On 02/03/2017 08:17, Michael Dyck wrote:
> On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>>
>> Lookahead restrictions fit very well into an LR(1) engine
>
> Again: Great, but how? E.g., do you pre-process the grammar, modify the construction of the automaton, and/or modify the operation of the parser?

For each state × token combination, the automaton describes what happens when you're in state S and see a token T.  The lookahead restrictions remove possible transitions; without them there would be ambiguities where a given state × token combination would want to do two incompatible things.

That's different from parametrized rules, which simply macro-expand into lots of individual rules.

>> as long as they're limited to only one token, and that's what I've
>> implemented in the validator.
>
> So when the grammar has a prohibited lookahead-sequence with more than one token (in ExpressionStatement, IterationStatement, and ExportDeclaration), does the validator just use the first token?

My LR(1) validator can't actually handle the case of multiple tokens of lookahead, and I didn't feel like doing an LR(2) grammar.  I had to check these by hand.

>> You need to be very careful with them if looking more than one token
>> ahead because lexing of the tokens can vary based on context. For
>> example, if the next few characters in front of the cursor are )/abc/i+,
>> then what is the second token? What is the third token? It's actually
>> context-dependent.
>
> But the context-dependentness of lexing is a parse-time problem, not a validation-time problem, right?

No.

> The syntactic level can just assume a stream of (correctly lexed) input elements.

No!  It's impossible to create a stream of correctly lexed input elements without doing syntactic level parsing.  See the example I gave above.

> Or do you validate deeper than the syntactic grammar?

Yes.  The syntactic grammar controls the lexer.  The validator looks for problems such as the syntactic grammar giving the lexer contradictory instructions.  An example would be any syntactic grammar automaton state where one outgoing syntactic transition would swallow a regexp token and another outgoing syntactic transition from the same state would swallow a / token.  If any such state exists, the grammar is broken.

> (And it seems to me that multi-token lookahead-restrictions would be hard for a validator to handle even if lexing *wasn't* context-dependent, but maybe that depends on how you handle them.)
>
> -Michael
>



More information about the es-discuss mailing list