JavaScript parser API

David Herman dherman at mozilla.com
Tue Jul 5 21:00:25 PDT 2011


> the AST API strawman - given the positive discussions on this list, I
> thought the idea was implicitly accepted last year, modulo details,
> so I was surprised not to see a refined strawman promoted.

It hasn't really been championed so far. I was concentrating on other proposals for ES.next.

>   - it does not support generic traversals, so it definitely needs a
>       pre-implemented traversal, sorting out each type of Node
>       (Array-based ASTs, like the es-lab version, make this slightly
>       easier - Arrays elements are ordered, unlike Object properties);

I designed it to be easily JSON-{de}serializable, so no special prototype. However, you can use the builder API to construct your own format:

    https://developer.mozilla.org/en/SpiderMonkey/Parser_API#Builder_objects

With a custom builder you can create objects with whatever methods you want, and builders for various formats can be shared in libraries.

>       at that stage, simple applications (such as tag generation)
>       may be better of working with hooks into the parser, rather
>       than hooks into an AST traversal? also, there is the risk that
>       one pre-implemented traversal might not cover all use cases,
>       in which case the boilerplate tax would have to be paid again;

I don't understand any of this.

>   - it is slightly easier to manipulate than an Array-based AST, but

More than slightly, IMO.

>       lack of pattern matching fall-through (alternative patterns for
>       destructuring) still hurts, and the selectors are lengthy, which
>       hampers visualization and construction; (this assumes that
>       fp-style AST processing is preferred over oo-style processing)

If I'd defined a new object type with its own prototype, it still wouldn't define all operations anyone would ever want. So they'd either have to monkey-patch it or it would need a visitor. Which you could write anyway. So I don't see much benefit to pre-defining a node prototype.

But again, see the builder API, where you can create your own custom node type.

>   - it is biased towards evaluation, which is a hindrance for other
>       uses (such as faithful unparsing, for program transformations);

It's just a reflection of the built-in SpiderMonkey parser, which was designed for the sole purpose of evaluation. I didn't reimplement a new parser.

>       this can be seen clearly in Literals, which are evaluated (why
>       not evaluate Object, Array, Function Literals as well? eval should
>       be part of AST processing, not of AST construction), but it also
>       shows in other constructs (comments are not stored at all, and
>       if commas/semicolons are not stored, how does one know
>       where they were located - programmers tend to be picky
>       about their personal or project-wide style guides?);

None of this data is available in a SpiderMonkey parse node.

>   - there are some minor oddities, from spelling differences to
>       the spec (Label(l)ed),

Heh, I shouldn't've capitulated to my (excellent and meticulous!) reviewer, who was unfamiliar with the spec:

    https://bugzilla.mozilla.org/show_bug.cgi?id=533874#c28

I can probably change that.

> to structuring decisions (why separate
>       UpdateExpression and LogicalExpression, when everything
>       else is in UnaryExpression and BinaryExpression?);

I separated update expressions and logical expressions because they have different control structure from the other unary and binary operators.

>       btw, why alternate/consequent instead of then/else, and

I was avoiding using keywords as property names, and consequent/alternate are standard terminology. I suppose .then/.else would be more convenient.

>       shouldn't that really be consequent->then and alternate->else
>       instead of the other way round (as the optional null for
>       consequent suggests)?

Doc bug, thanks. Fixed.

> My main issue is unparsing support for program transformations

    https://bugzilla.mozilla.org/show_bug.cgi?id=590755

> (though IDEs will similarly need more info, for comment extraction,
> syntax highlighting, and syntax-based operations).

This is all the stuff that will almost certainly require separate implementations from the engine's core parser. And maybe that's fine. In my case, I wanted to implement a reflection of our existing parser, because it's guaranteed to track the behavior of SpiderMonkey's parser.

> What I did for now was to add a field to each Node, in which I
> store an unprocessed Array of the sub-ASTs, including tokens.
> Essentially, the extended AST Nodes provide both abstract info
> for analysis and evaluation and a structured view of the token
> stream belonging to each Node, for lower-level needs.
> 
> Whitespace/comments are stored separately, indexed by the
> start position of the following token (this is going to work better
> for comment-before-token that for comment-after-token, but it
> is a start, for unparsing or comment-extraction tools).

You've lost me again. Are you describing a parser you wrote?

> This allows for a generic traversal of the Array-based unprocessed
> AST fragments, for unparsing, but I still have to rearrange things
> so that I can actually store the information I need (can't add info
> to null as an AST value) and distinguish meta-info ("computed"
> and "prefix" properties) from sub-ASTs.

I'm still lost.

> Overall, the impression is that this AST was designed by someone
> resigned to the fact of having to write Node-type-specific traversal
> code for each purpose, with a limited number of purposes planned
> (such as evaluation). This could be a burden for other uses of such
> ASTs (boilerplate tax).

It was designed to be minimal and serializable. It was a lot of code, so I figured I would just focus on a) making sure all the data was there and b) making it possible to provide a custom data format via the builder API. This is what I came up with, but I can revisit the API design if it's useful.

> I hope these notes help - I'd really like to see a standard JS
> parser API implemented across engines. For language
> experimentation, we'd still need separate tweakable parsers,
> but access to the efficient engine parsers for current JS would
> give tool development a boost.

I'm still not convinced this is such a big win. Reflect.parse gives you *some* performance, but it still requires two traversals (one to generate the internal C++ JSParseNode tree and then a second to convert that to a JS object tree). But part of the benefit is knowing you have exactly the SpiderMonkey parser. Once implementors have to write a separate parser, the possibility of divergence increases, and the maintenance cost of building a second parser in a low-level language is high. At that point, they might just want to write it in JS. But anybody could do that.

>> But there are also tough questions about what the parser
>> should do with engine-specific language extensions.
> 
> Actually, that starts before the AST: I'd like to see feature-based
> language versioning, instead of the current monolithic version
> numbering - take generators as an example feature:
> 
> Perhaps JS1.7 ("javascript;version=1.7") happens to be the first
> JS version to support "yield", and is backwards compatible with
> JS1.5, which might happen to match ES3; and JS1.8.5, which
> happens to match ES5, might be backwards compatible with
> JS1.7. But it is unlikely that the JSx which happens to match ES6
> will be backwards compatible with JS1.7 (while ES5-breaking
> changes will be limited, replacing experimental JS1.x features
> with standardized variants is another matter).
> 
> Whereas, if I was able to specify "use yield", and be similarly
> selective about other language features, then either of JS1.7,
> JS1.8.5 and ES6 engines might be able to do the job, depending
> on what other language features my code depends on. Also,
> other engines might want to implement some features -like
> "yield"- selectively, without aiming to support all of JS1.7, and
> long before being able to support all of ES6.

That's asking for quite a modularized/configurable parser.

>> I agree about the issue of multiple parsers. The reason I
>> was able to do the SpiderMonkey library fairly easily was
>> that I simply reflect exactly the parser that exists. But to
>> have a standards-compliant parser, we'd probably have
>> to write a separate parser. That's definitely a tall order.
> 
> It should not be, provided one distinguishes between
> standards-compliant and production use. If the ES grammar
> is LR(1), it should really be specified in a parser tool format,

Mainstream production JS engines have moved away from parser generators.

> both for verification and to generate standards-compliant
> tools to compare against. Depending on how efficient the
> JS Bison implementation is, this might even lead to useable
> parser performance.

Again, this could be implemented by anyone as a pure JS library.

> There may be problems in finding a tool that generates all
> the information needed for a useful AST (source locations,
> comments, scope info, ..), but we do not need to solve every
> issue immediately to make progress, right? And if the ES
> committee were to ask ES parser generator implementors
> whether their tools could be extended to serve an AST spec,
> response might be favourable.
> 
> It would be nice if the spec parser was generated in Javascript,
> but any tool-usable standard grammar would be useful - once
> the grammar can be processed by a freely available tool, it can
> be translated to similar formats, some of which have Javascript
> implementations (eg Jison, ANTLR).
> 
> Having played a little with the ANTLRWorks environment, it
> looks promising, is easy to install (just a .jar), has user-contributed
> ES grammars, and can spot some ambiguities easily (though
> I don't think its check is complete, and the ES grammar is too
> complex to make naïve parse-tree visualization helpful). If other
> tools have better ES grammar development support, I'd like to
> hear about them.
> 
> Without a standard spec-conformant tool-readable grammar,
> such tools remain of limited use. With a tool-readable grammar,
> adding AST generation might turn out to be an afternoon's work
> (followed by years of testing/debugging;-).

A standard, machine-processable grammar would be a nice-to-have. Agreed.

I hate to complain, but can you try to trim your messages? It takes an enormous amount of time to read and respond to these huge messages.

    https://twitter.com/#!/statpumpkin/status/66187260407709696

Dave



More information about the es-discuss mailing list