Nested Quasis
Waldemar Horwat
waldemar at google.com
Tue Feb 7 13:52:42 PST 2012
On 02/06/2012 06:49 PM, Mark S. Miller wrote:
> On Mon, Feb 6, 2012 at 3:26 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com>> wrote:
>
> On 02/03/2012 08:07 PM, Mark S. Miller wrote:
>
> On Fri, Feb 3, 2012 at 12:58 PM, Waldemar Horwat <waldemar at google.com <mailto:waldemar at google.com> <mailto:waldemar at google.com <mailto:waldemar at google.com>>> wrote:
>
> On 02/02/2012 06:27 PM, Waldemar Horwat wrote:
>
> [...]
>
> Note that this is more complex than just having the parser switch modes for the treatment of / as division vs. regexp. Here comments and white space are also affected, which can in turn the structure of the lexer upside down. The kinds of cases I'm thinking of are:
>
> `abc$/*comment*/identifier//
> `
> (here we have a /**/ comment and a // comment)
>
>
> There is no valid quasiHole above, so the whole thing matches a QuasiOnly. The QuasiOnly includes all characters between the backticks. Nothing is taken to be a comment, just like it wouldn't be if it appeared within a string.
>
>
> According to which lexical grammar? According to the one you provided earlier in this thread, `abc$ is a QuasiOpen token:
>
> QuasiOpen ::
> ` QuasiChar* $
>
>
> Parsing further, /*comment*/identifier is a single identifier token as far as the syntactic grammar is concerned.
>
>
> I was imprecise. I'll try again, using only lexical grammar concepts and making explicit where whitespace, comments, etc may appear.
>
> Token ::
> IdentifierName
> Punctuator
> NumericLiteral
> StringLiteral
> Quasi
>
> Quasi ::
> QuasiOnly
> QuasiOpen QuasiHole (QuasiMiddle QuasiHole)* QuasiClose
>
> QuasiOnly ::
> ` QuasiChar* `
>
> QuasiOpen ::
> ` QuasiChar* $
>
> QuasiMiddle ::
> QuasiChar* $
>
> QuasiEnd ::
> QuasiChar* `
>
> QuasiChar ::
> SourceCharacter *but not one of $ or `*
> $ $
> $ `
> $ \ EscapeSequence
>
> QuasiHole ::
> Identifier
> { Spacing* (BalancedCurlySequence Spacing*)* }
>
> BalancedCurlySequence ::
> Token *but not one of { or }*
> { Spacing* (BalancedCurlySequence Spacing*)* }
>
> Spacing ::
> WhiteSpace
> LineTerminator
> Comment
>
> Within a Quasi, no character sequences are interpreted as whitespace or comments except where indicated by Spacing above.
That's going back to the previous approach of treating the whole quasi as a single token. This doesn't work because it's not possible to specify the BalancedCurlySequence production as a lexical grammar. You're confusing the lexical with the syntactic grammars here.
Examples of why BalancedCurlySequence doesn't work:
{/[{]/}
(interior parses as five single-character tokens but no matching closing bracket)
{ainb}
(interior parses as three tokens: a in b)
{3.toString()}
(interior parses as 3 . toString ( ))
Waldemar
More information about the es-discuss
mailing list