LR(1) grammar/parser and lookahead-restrictions

Michael Dyck jmdyck at ibiblio.org
Sat Feb 4 15:20:31 UTC 2017


On 17-02-03 05:32 PM, Waldemar Horwat wrote:
> On 02/03/2017 08:17, Michael Dyck wrote:
>> On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>>>
>>> Lookahead restrictions fit very well into an LR(1) engine
>>
>> Again: Great, but how? E.g., do you pre-process the grammar, modify the
>> construction of the automaton, and/or modify the operation of the parser?
>
> For each state × token combination, the automaton describes what happens
> when you're in state S and see a token T.  The lookahead restrictions remove
> possible transitions; without them there would be ambiguities where a given
> state × token combination would want to do two incompatible things.

Okay, so do you generate the automaton (ignoring lookahead restrictions) and 
then remove transitions (using lookahead restrictions)? Or do you integrate 
the lookahead-restrictions into the generation of the automaton?

> That's different from parametrized rules, which simply macro-expand into
> lots of individual rules.

Yup.


>> But the context-dependentness of lexing is a parse-time problem, not a
>> validation-time problem, right?
>
> No.
>
>> The syntactic level can just assume a stream of (correctly lexed) input
>> elements.

(I should have said "*Validation* of the syntactic level [etc]")

> No!  It's impossible to create a stream of correctly lexed input elements
> without doing syntactic level parsing.

I quite agree. I didn't mean to suggest otherwise. What I mean is that, once 
you've generated the automaton for the syntactic grammar, you can just look 
at each state's set of expected terminals, and from that deduce the goal 
symbol that the lexer will need to use to get the next token when the parser 
is in that state. The point being that you can do that *after* generating 
the syntactic automaton. So the context-dependentness of lexing doesn't have 
to affect the process of generating the syntactic automaton.

(That's assuming an LR(1)-ish parser, and an approach where you don't try to 
combine the syntactic and lexical grammars to generate a single 
[scannerless] automaton. Which may not be your approach.)

> The validator looks for problems such as the syntactic grammar giving the
> lexer contradictory instructions. An example would be any syntactic
> grammar automaton state where one outgoing syntactic transition would
> swallow a regexp token and another outgoing syntactic transition from the
> same state would swallow a / token. If any such state exists, the grammar
> is broken.

Yup, the spec asserts the non-existence of such states ("syntactic grammar 
contexts").

-Michael



More information about the es-discuss mailing list