oliver at apple.com
Wed Jul 6 12:55:29 PDT 2011
>> This is all the stuff that will almost certainly require separate
>> implementations from the engine's core parser. And maybe
>> that's fine. In my case, I wanted to implement a reflection of
>> our existing parser, because it's guaranteed to track the
>> behavior of SpiderMonkey's parser.
> Understood. But shouldn't separate parsers also implement
> the standard parser API? And shouldn't it therefore cover the
> information needed for such common use cases?
The problem is our parsers don't produce the same AST, and the structure
produced by parsing an arbitrary piece of JS _is_ API. To have a standard API
requires a standardised AST, which seems unlikely to happen (for reasons laid
out many times in the past).
> Browser parsers might then only support a partial profile of
> the full standard API - whatever they can support without
> negative impact on their main usage.
Partial support of an API means your code would have to deal with what is
missing (which will vary between browsers).
JSC's parser is constructed in such a way that we could generate a parse tree
directly into JS form (it would be a matter of jumping through hoops to create the
correct builder), however doing so probably won't provide a tree that you
necessarily want as something to manipulate. We drop var declarations (relying
on other tracking instead) we don't track token locations beyond what is needed
for some specific cases. Some "lists" in the grammar are represented as linked
lists, others as arrays, etc, etc.
I have a vague recollection that the SM parser strips out some characters prior to
parsing (brendan?) and generates bytecode into the ast (though based on dherman's
comments maybe that's no longer the case)
> Though it might not actually cost much to support the additional
> info in SpiderMonkey: most of it could be in the token stream,
> which is usually thrown away, but could be kept via a flag, and
> the AST's source locations can be used to extract segments of
> the token stream (such as any comments preceding a location).
Speaking again for JSC -- we don't have an actual "token stream" the lexer provides.
The lexer simply walks the input source a token at a time as requested, partially
because how we lex is driven by the context and mode of the parser, and partially
because creating a distinct token stream is nice in an academic context, but would be
a huge performance hole in practice. And the tokens that we do have don't necessarily
contain all the information you would want (because each additional write the lexer
makes to the token structure is actually measurable in some our perf tests).
More information about the es-discuss