MultiLineCommentChars and PostAsteriskCommentChars productions
Michael Dyck
jmdyck at ibiblio.org
Mon Apr 10 02:23:27 UTC 2017
On 17-04-09 05:13 PM, Darien Valentine wrote:
> I am curious about this lexical production, because if I understand
> correctly, it seems to imply either backtracking or a lookahead that isn’t
> made explicit.
Yes, depending on your parsing technique.
> ..., a naive match will be made for PostAsteriskCommentChars against
> the `*` of a terminal `*/` of the MultiLineComment.
>
> While this is not ultimately ambiguous because, having made that match, the
> next attempt will fail and we can backtrack one step to find another way
> out; or, more realistically, an implementation would look ahead at whether
> the next character (after "*") is `/` before deciding that
> PostAsteriskCommentChars/2 should _really_ be matched.
In a bottom-up parser, one would say that, with a next symbol (i.e.,
character) of '*', there is a shift-reduce conflict that cannot be resolved
by that symbol alone. Instead, two symbols of lookahead are required.
> However it seems unusual that the grammar is written this way since
> elsewhere the grammar seems to carefully avoid implied backtracking,
> and lookaheads are rare and explicit.
Apparently you're referring to phrases like "[lookahead != foo]". You
shouldn't think of these as "lookaheads". The spec doesn't have a name for
these, but I call them "lookahead-restrictions". Each is a restriction on
the applicability of a production, based on the next one or two symbols.
They do indicate places where a parser would need to employ lookahead to
make a decision, but there are many other such places (such as the one you
noted above), where the need for lookahead is simply a consequence of the
interaction of productions, and is not called out explicitly in the grammar.
-Michael
More information about the es-discuss
mailing list