Nested Quasis

Waldemar Horwat waldemar at google.com
Tue Feb 7 13:52:42 PST 2012


On 02/06/2012 06:49 PM, Mark S. Miller wrote:
> On Mon, Feb 6, 2012 at 3:26 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com>> wrote:
>
>     On 02/03/2012 08:07 PM, Mark S. Miller wrote:
>
>         On Fri, Feb 3, 2012 at 12:58 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com> <mailto:waldemar at google.com <mailto:waldemar at google.com>>> wrote:
>
>             On 02/02/2012 06:27 PM, Waldemar Horwat wrote:
>
>         [...]
>
>             Note that this is more complex than just having the parser switch modes for the treatment of / as division vs. regexp.  Here comments and white space are also affected, which can in turn the structure of the lexer upside down.  The kinds of cases I'm thinking of are:
>
>             `abc$/*comment*/identifier//
>             `
>             (here we have a /**/ comment and a // comment)
>
>
>         There is no valid quasiHole above, so the whole thing matches a QuasiOnly. The QuasiOnly includes all characters between the backticks. Nothing is taken to be a comment, just like it wouldn't be if it appeared within a string.
>
>
>     According to which lexical grammar?  According to the one you provided earlier in this thread, `abc$ is a QuasiOpen token:
>
>       QuasiOpen ::
>             ` QuasiChar* $
>
>
>     Parsing further, /*comment*/identifier is a single identifier token as far as the syntactic grammar is concerned.
>
>
> I was imprecise. I'll try again, using only lexical grammar concepts and making explicit where whitespace, comments, etc may appear.
>
>      Token ::
>          IdentifierName
>          Punctuator
>          NumericLiteral
>          StringLiteral
>          Quasi
>
>      Quasi ::
>          QuasiOnly
>          QuasiOpen QuasiHole (QuasiMiddle QuasiHole)* QuasiClose
>
>      QuasiOnly ::
>          ` QuasiChar* `
>
>      QuasiOpen ::
>          ` QuasiChar* $
>
>      QuasiMiddle ::
>          QuasiChar* $
>
>      QuasiEnd ::
>          QuasiChar* `
>
>      QuasiChar ::
>          SourceCharacter *but not one of $ or `*
>          $ $
>          $ `
>          $ \ EscapeSequence
>
>      QuasiHole ::
>          Identifier
>          { Spacing* (BalancedCurlySequence Spacing*)* }
>
>      BalancedCurlySequence ::
>          Token *but not one of { or }*
>          { Spacing* (BalancedCurlySequence Spacing*)* }
>
>      Spacing ::
>          WhiteSpace
>          LineTerminator
>          Comment
>
> Within a Quasi, no character sequences are interpreted as whitespace or comments except where indicated by Spacing above.

That's going back to the previous approach of treating the whole quasi as a single token.  This doesn't work because it's not possible to specify the BalancedCurlySequence production as a lexical grammar.  You're confusing the lexical with the syntactic grammars here.

Examples of why BalancedCurlySequence doesn't work:

{/[{]/}
(interior parses as five single-character tokens but no matching closing bracket)

{ainb}
(interior parses as three tokens: a in b)

{3.toString()}
(interior parses as 3 . toString ( ))

     Waldemar


More information about the es-discuss mailing list