Nested Quasis

Waldemar Horwat waldemar at google.com
Tue Feb 7 15:47:05 PST 2012


On 02/07/2012 02:51 PM, Mark S. Miller wrote:
> On Tue, Feb 7, 2012 at 1:52 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com>> wrote:
> [...]
>
>     That's going back to the previous approach of treating the whole quasi as a single token.  This doesn't work because it's not possible to specify the BalancedCurlySequence production as a lexical grammar.  You're confusing the lexical with the syntactic grammars here.
>
>
> Hi Waldemar, I am first of all trying to make clear what we're actually proposing, and to resolve any genuine ambiguity. As for how we phrase this proposal so that it fits with the rest of our spec language, what do you suggest?
>
>
>     Examples of why BalancedCurlySequence doesn't work:
>
>     {/[{]/}
>     (interior parses as five single-character tokens but no matching closing bracket)
>
>
> Yes, and therefore a program consisting of
>
>      `{/[{]/}`
>
> fails to lex and fails to parse. That seems like the correct outcome.

Why?  It's just a regexp.

>     {ainb}
>     (interior parses as three tokens: a in b)
>
> Why doesn't it parse as one token: ainb ?

The point is that a in b is one valid parse.  I don't need to show that there are no other valid parses.  In fact, there are lots of other valid parses because the grammar is very ambiguous.

>     {3.toString()}
>     (interior parses as 3 . toString ( ))
>
> Why? That's not what the JS lexer does anywhere else?

That's the problem with the rule you gave.

> I don't at all see how you arrived at your conclusions. Is it actually unclear what I am trying to say, or are you simply taking issue with how I'm saying it? If you find Erik's way of specifying ok, let's just use that. As I just said in reply to him, it does capture my actual intent more directly.

The bug is in what you're trying to say, not in how you're saying it.  You're confusing the lexical and syntactic grammars.  Due to this confusion you're trying lexical productions such as

BalancedCurlySequence ::
     Token *but not one of { or }*
     { Spacing* (BalancedCurlySequence Spacing*)* }

To illustrate the problem, consider a simpler lexer rule:

TokenSequence ::
   Token*

This will lex ainb as many things, including for example a in b.  The existing lexer resolves it by always chomping the largest sequence of characters to bite off as the next lexical token.  Once it accepts a token, it doesn't backtrack if it later finds an alternative parse for that token that would have made future tokens work better.  On the other hand, if you allow productions such as a TokenSequence inside a lexical token, then you get full backtracking and ambiguity across the Tokens that make up the TokenSequence because they are all part of one lexical token.

I was favorable to splitting up a quasi into multiple tokens, where this problem for the most part doesn't arise.  If you want to make the whole quasi into one token, then you'll need to solve this problem.

     Waldemar


More information about the es-discuss mailing list