JavaScript parser API

Claus Reinke claus.reinke at talk21.com
Wed Jul 6 12:40:46 PDT 2011


>>   - it is slightly easier to manipulate than an Array-based AST, but
> More than slightly, IMO.

Ok. Though destructuring closes some of the gap.

>>       lack of pattern matching fall-through (alternative patterns for
>>       destructuring) still hurts, and the selectors are lengthy, which
>>       hampers visualization and construction; (this assumes that
>>       fp-style AST processing is preferred over oo-style processing)
>
> If I'd defined a new object type with its own prototype, it still wouldn't
> define all operations anyone would ever want. So they'd either have to
> monkey-patch it or it would need a visitor. Which you could write
> anyway. So I don't see much benefit to pre-defining a node prototype.

You're right that no single traversal scheme will serve all purposes,
and I agree that a prototype isn't needed for fp-style processing. I was
just remarking that much of the conciseness of fp for AST processing
comes from good deconstruction and construction support, which is
not entirely there yet in JS.

Instead of writing cases of destructuring patterns and selecting the
first that matches, we have to write conditions to find out what type
of Node we have, and only then destructure. And the long selectors
make even that slightly tedious (though they help with readability):

if (node.type==='IfStatement') {
    let {type, test, consequent, alternative} = node;
    let notTest = {type='UnaryExpression',
                        operator={type='UnaryOperator',token='!'},
                        prefix=true,
                        argument=test};
    let newIfStatement = {type,
                                    notTest,
                                    consequent=alternative,
                                    alternative=consequent };
}

Object literal improvements help, object update { p: v, ... : obj } would
help, and a way to recover from refutable destructuring would help.
So the proposals and strawmen are heading in the right direction.

> But again, see the builder API, where you can create your own
> custom node type.

Yes, that is useful. But to use this to create custom ASTs which could
make specific tasks easier, I would need to write a handler for every
builder callback, so I'm back to boilerplate code.

It seems there is no way to avoid writing a generic traversal library
for Node ASTs. That has to be done only once, though, and does not
need to be part of the standard.

>>   - it is biased towards evaluation, which is a hindrance for other
>>       uses (such as faithful unparsing, for program transformations);
>
> It's just a reflection of the built-in SpiderMonkey parser, which was
> designed for the sole purpose of evaluation. I didn't reimplement a
> new parser.

Right. But is that what we'd want from a standard Parser API?

>>       this can be seen clearly in Literals, which are evaluated (why
>>       not evaluate Object, Array, Function Literals as well? eval should
>>       be part of AST processing, not of AST construction), but it also
>>       shows in other constructs (comments are not stored at all, and
>>       if commas/semicolons are not stored, how does one know
>>       where they were located - programmers tend to be picky
>>       about their personal or project-wide style guides?);
>
> None of this data is available in a SpiderMonkey parse node.

Indeed. You wouldn't believe how frustrating it can be that textbook
parsing ignores comments, when you're looking for a parser to use
for program transformations, or a documentation generator, or other
IDE-style tools. It means that the majority of parsers can't be re-used.

The question is: does one augment existing parsers, to enable tool
building on top, or does one let every tool builder write and maintain
their own parser? The former means a little work for existing parsers,
the latter means a lot of work for wannabe tool builders. The latter
also means that lots of useful small tools won't be built, because
building them wouldn't be a small effort.

Having a standard JS-in-JS parser could be a good compromise,
but even for that a standard AST has merit.

>>   - there are some minor oddities, from spelling differences to
>>       the spec (Label(l)ed),
>
> Heh, I shouldn't've capitulated to my (excellent and meticulous!)
> reviewer, who was unfamiliar with the spec:
>
>    https://bugzilla.mozilla.org/show_bug.cgi?id=533874#c28
>
> I can probably change that.

The AST is not directly related to the grammar, so it isn't a big deal.
Just confusing to us English-as-a-foreign-language folks.

>> to structuring decisions (why separate
>>       UpdateExpression and LogicalExpression, when everything
>>       else is in UnaryExpression and BinaryExpression?);
>
> I separated update expressions and logical expressions because
> they have different control structure from the other unary and
> binary operators.

Hm, ok, at least that explains it (perhaps add that note to the docs?).
I'm not convinced the separation is useful, though - for many tasks,
it is just another obstacle. Only experience will tell.

>>       btw, why alternate/consequent instead of then/else, and
>
> I was avoiding using keywords as property names, and
> consequent/alternate are standard terminology. I suppose
> .then/.else would be more convenient.

Ok, perhaps a useful strategy while older engines are still in use.

Still, shorter would be better - we have to write out those selectors
for deconstruction and construction, and for displaying ASTs. Not
to mention nested ASTs (alternate.alternate.alternate.consequent;-).

>> My main issue is unparsing support for program transformations
>
>    https://bugzilla.mozilla.org/show_bug.cgi?id=590755

Thanks, that is a start. Actually, it will be sufficient for some
applications.

Unfortunately, experience tells me it won't be sufficient for user-
level program transformations. Several refactoring tool builders
have commented how changing user code by pretty-printing
negatively impacted on tool uptake. When Haskell and Erlang
refactorers took some pains to keep user-code code as written
(apart from places where refactorings generated entirely new
code), that was one of the most positively evaluated aspects of
these tools.

>> (though IDEs will similarly need more info, for comment
>> extraction, syntax highlighting, and syntax-based operations).
>
> This is all the stuff that will almost certainly require separate
> implementations from the engine's core parser. And maybe
> that's fine. In my case, I wanted to implement a reflection of
> our existing parser, because it's guaranteed to track the
> behavior of SpiderMonkey's parser.

Understood. But shouldn't separate parsers also implement
the standard parser API? And shouldn't it therefore cover the
information needed for such common use cases?

Browser parsers might then only support a partial profile of
the full standard API - whatever they can support without
negative impact on their main usage.

Though it might not actually cost much to support the additional
info in SpiderMonkey: most of it could be in the token stream,
which is usually thrown away, but could be kept via a flag, and
the AST's source locations can be used to extract segments of
the token stream (such as any comments preceding a location).

Breaking again,
Claus

 



More information about the es-discuss mailing list