JavaScript parser API

Zachary Carter zack.carter at gmail.com
Tue Jul 5 21:21:19 PDT 2011


On Wed, Jul 6, 2011 at 12:00 AM, David Herman <dherman at mozilla.com> wrote:
>> the AST API strawman - given the positive discussions on this list, I
>> thought the idea was implicitly accepted last year, modulo details,
>> so I was surprised not to see a refined strawman promoted.
>
> It hasn't really been championed so far. I was concentrating on other proposals for ES.next.
>
>>   - it does not support generic traversals, so it definitely needs a
>>       pre-implemented traversal, sorting out each type of Node
>>       (Array-based ASTs, like the es-lab version, make this slightly
>>       easier - Arrays elements are ordered, unlike Object properties);
>
> I designed it to be easily JSON-{de}serializable, so no special prototype. However, you can use the builder API to construct your own format:
>
>    https://developer.mozilla.org/en/SpiderMonkey/Parser_API#Builder_objects
>
> With a custom builder you can create objects with whatever methods you want, and builders for various formats can be shared in libraries.
>
>>       at that stage, simple applications (such as tag generation)
>>       may be better of working with hooks into the parser, rather
>>       than hooks into an AST traversal? also, there is the risk that
>>       one pre-implemented traversal might not cover all use cases,
>>       in which case the boilerplate tax would have to be paid again;
>
> I don't understand any of this.
>
>>   - it is slightly easier to manipulate than an Array-based AST, but
>
> More than slightly, IMO.
>
>>       lack of pattern matching fall-through (alternative patterns for
>>       destructuring) still hurts, and the selectors are lengthy, which
>>       hampers visualization and construction; (this assumes that
>>       fp-style AST processing is preferred over oo-style processing)
>
> If I'd defined a new object type with its own prototype, it still wouldn't define all operations anyone would ever want. So they'd either have to monkey-patch it or it would need a visitor. Which you could write anyway. So I don't see much benefit to pre-defining a node prototype.
>
> But again, see the builder API, where you can create your own custom node type.
>
>>   - it is biased towards evaluation, which is a hindrance for other
>>       uses (such as faithful unparsing, for program transformations);
>
> It's just a reflection of the built-in SpiderMonkey parser, which was designed for the sole purpose of evaluation. I didn't reimplement a new parser.
>
>>       this can be seen clearly in Literals, which are evaluated (why
>>       not evaluate Object, Array, Function Literals as well? eval should
>>       be part of AST processing, not of AST construction), but it also
>>       shows in other constructs (comments are not stored at all, and
>>       if commas/semicolons are not stored, how does one know
>>       where they were located - programmers tend to be picky
>>       about their personal or project-wide style guides?);
>
> None of this data is available in a SpiderMonkey parse node.
>
>>   - there are some minor oddities, from spelling differences to
>>       the spec (Label(l)ed),
>
> Heh, I shouldn't've capitulated to my (excellent and meticulous!) reviewer, who was unfamiliar with the spec:
>
>    https://bugzilla.mozilla.org/show_bug.cgi?id=533874#c28
>
> I can probably change that.
>
>> to structuring decisions (why separate
>>       UpdateExpression and LogicalExpression, when everything
>>       else is in UnaryExpression and BinaryExpression?);
>
> I separated update expressions and logical expressions because they have different control structure from the other unary and binary operators.
>
>>       btw, why alternate/consequent instead of then/else, and
>
> I was avoiding using keywords as property names, and consequent/alternate are standard terminology. I suppose .then/.else would be more convenient.
>
>>       shouldn't that really be consequent->then and alternate->else
>>       instead of the other way round (as the optional null for
>>       consequent suggests)?
>
> Doc bug, thanks. Fixed.
>
>> My main issue is unparsing support for program transformations
>
>    https://bugzilla.mozilla.org/show_bug.cgi?id=590755
>
>> (though IDEs will similarly need more info, for comment extraction,
>> syntax highlighting, and syntax-based operations).
>
> This is all the stuff that will almost certainly require separate implementations from the engine's core parser. And maybe that's fine. In my case, I wanted to implement a reflection of our existing parser, because it's guaranteed to track the behavior of SpiderMonkey's parser.
>
>> What I did for now was to add a field to each Node, in which I
>> store an unprocessed Array of the sub-ASTs, including tokens.
>> Essentially, the extended AST Nodes provide both abstract info
>> for analysis and evaluation and a structured view of the token
>> stream belonging to each Node, for lower-level needs.
>>
>> Whitespace/comments are stored separately, indexed by the
>> start position of the following token (this is going to work better
>> for comment-before-token that for comment-after-token, but it
>> is a start, for unparsing or comment-extraction tools).
>
> You've lost me again. Are you describing a parser you wrote?
>
>> This allows for a generic traversal of the Array-based unprocessed
>> AST fragments, for unparsing, but I still have to rearrange things
>> so that I can actually store the information I need (can't add info
>> to null as an AST value) and distinguish meta-info ("computed"
>> and "prefix" properties) from sub-ASTs.
>
> I'm still lost.
>
>> Overall, the impression is that this AST was designed by someone
>> resigned to the fact of having to write Node-type-specific traversal
>> code for each purpose, with a limited number of purposes planned
>> (such as evaluation). This could be a burden for other uses of such
>> ASTs (boilerplate tax).
>
> It was designed to be minimal and serializable. It was a lot of code, so I figured I would just focus on a) making sure all the data was there and b) making it possible to provide a custom data format via the builder API. This is what I came up with, but I can revisit the API design if it's useful.
>
>> I hope these notes help - I'd really like to see a standard JS
>> parser API implemented across engines. For language
>> experimentation, we'd still need separate tweakable parsers,
>> but access to the efficient engine parsers for current JS would
>> give tool development a boost.
>
> I'm still not convinced this is such a big win. Reflect.parse gives you *some* performance, but it still requires two traversals (one to generate the internal C++ JSParseNode tree and then a second to convert that to a JS object tree). But part of the benefit is knowing you have exactly the SpiderMonkey parser. Once implementors have to write a separate parser, the possibility of divergence increases, and the maintenance cost of building a second parser in a low-level language is high. At that point, they might just want to write it in JS. But anybody could do that.
>
>>> But there are also tough questions about what the parser
>>> should do with engine-specific language extensions.
>>
>> Actually, that starts before the AST: I'd like to see feature-based
>> language versioning, instead of the current monolithic version
>> numbering - take generators as an example feature:
>>
>> Perhaps JS1.7 ("javascript;version=1.7") happens to be the first
>> JS version to support "yield", and is backwards compatible with
>> JS1.5, which might happen to match ES3; and JS1.8.5, which
>> happens to match ES5, might be backwards compatible with
>> JS1.7. But it is unlikely that the JSx which happens to match ES6
>> will be backwards compatible with JS1.7 (while ES5-breaking
>> changes will be limited, replacing experimental JS1.x features
>> with standardized variants is another matter).
>>
>> Whereas, if I was able to specify "use yield", and be similarly
>> selective about other language features, then either of JS1.7,
>> JS1.8.5 and ES6 engines might be able to do the job, depending
>> on what other language features my code depends on. Also,
>> other engines might want to implement some features -like
>> "yield"- selectively, without aiming to support all of JS1.7, and
>> long before being able to support all of ES6.
>
> That's asking for quite a modularized/configurable parser.
>
>>> I agree about the issue of multiple parsers. The reason I
>>> was able to do the SpiderMonkey library fairly easily was
>>> that I simply reflect exactly the parser that exists. But to
>>> have a standards-compliant parser, we'd probably have
>>> to write a separate parser. That's definitely a tall order.
>>
>> It should not be, provided one distinguishes between
>> standards-compliant and production use. If the ES grammar
>> is LR(1), it should really be specified in a parser tool format,
>
> Mainstream production JS engines have moved away from parser generators.
>
>> both for verification and to generate standards-compliant
>> tools to compare against. Depending on how efficient the
>> JS Bison implementation is, this might even lead to useable
>> parser performance.
>
> Again, this could be implemented by anyone as a pure JS library.

FWIW, I've implemented such a library here: https://github.com/zaach/reflect.js
The grammar is based on the old JavaScriptCore Bison grammar with some
tweaks to make it LALR(1) (Jison doesn't do efficient LR(1) yet.)

>
>> There may be problems in finding a tool that generates all
>> the information needed for a useful AST (source locations,
>> comments, scope info, ..), but we do not need to solve every
>> issue immediately to make progress, right? And if the ES
>> committee were to ask ES parser generator implementors
>> whether their tools could be extended to serve an AST spec,
>> response might be favourable.
>>
>> It would be nice if the spec parser was generated in Javascript,
>> but any tool-usable standard grammar would be useful - once
>> the grammar can be processed by a freely available tool, it can
>> be translated to similar formats, some of which have Javascript
>> implementations (eg Jison, ANTLR).
>>
>> Having played a little with the ANTLRWorks environment, it
>> looks promising, is easy to install (just a .jar), has user-contributed
>> ES grammars, and can spot some ambiguities easily (though
>> I don't think its check is complete, and the ES grammar is too
>> complex to make naïve parse-tree visualization helpful). If other
>> tools have better ES grammar development support, I'd like to
>> hear about them.
>>
>> Without a standard spec-conformant tool-readable grammar,
>> such tools remain of limited use. With a tool-readable grammar,
>> adding AST generation might turn out to be an afternoon's work
>> (followed by years of testing/debugging;-).
>
> A standard, machine-processable grammar would be a nice-to-have. Agreed.
>
> I hate to complain, but can you try to trim your messages? It takes an enormous amount of time to read and respond to these huge messages.
>
>    https://twitter.com/#!/statpumpkin/status/66187260407709696
>
> Dave
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>



-- 
Zach Carter


More information about the es-discuss mailing list