MultiLineCommentChars and PostAsteriskCommentChars productions

Michael Dyck jmdyck at ibiblio.org
Mon Apr 10 02:23:27 UTC 2017


On 17-04-09 05:13 PM, Darien Valentine wrote:
> I am curious about this lexical production, because if I understand
> correctly, it seems to imply either backtracking or a lookahead that isn’t
> made explicit.

Yes, depending on your parsing technique.

> ..., a naive match will be made for PostAsteriskCommentChars against
> the `*` of a terminal `*/` of the MultiLineComment.
>
> While this is not ultimately ambiguous because, having made that match, the
> next attempt will fail and we can backtrack one step to find another way
> out; or, more realistically, an implementation would look ahead at whether
> the next character (after "*") is `/` before deciding that
> PostAsteriskCommentChars/2 should _really_ be matched.

In a bottom-up parser, one would say that, with a next symbol (i.e., 
character) of '*', there is a shift-reduce conflict that cannot be resolved 
by that symbol alone. Instead, two symbols of lookahead are required.

> However it seems unusual that the grammar is written this way since
> elsewhere the grammar  seems to carefully avoid implied backtracking,
> and lookaheads are rare and explicit.

Apparently you're referring to phrases like "[lookahead != foo]". You 
shouldn't think of these as "lookaheads". The spec doesn't have a name for 
these, but I call them "lookahead-restrictions". Each is a restriction on 
the applicability of a production, based on the next one or two symbols. 
They do indicate places where a parser would need to employ lookahead to 
make a decision, but there are many other such places (such as the one you 
noted above), where the need for lookahead is simply a consequence of the 
interaction of productions, and is not called out explicitly in the grammar.

-Michael



More information about the es-discuss mailing list