AST in JSON format

Maciej Stachowiak mjs at apple.com
Mon Dec 7 11:16:58 PST 2009


On Dec 7, 2009, at 10:11 AM, Brendan Eich wrote:

> On Dec 7, 2009, at 8:56 AM, Maciej Stachowiak wrote:
>
>> Actually, this is potentially a factor for any natively supported  
>> AST format. If execution is direct rather than via transoformation  
>> to JS source, the implementation would have to verify that the AST  
>> is one that could be created by parsing JS source.
>
> This reminds me of SafeTSA:
>
> http://portal.acm.org/citation.cfm?id=378825
> http://portal.acm.org/citation.cfm?doid=1377492.1377496
>
> and more specifically of work by Christian Stork and Michael Franz,  
> see:
>
> http://www.ics.uci.edu/~cstork/
>
> The idea as I first heard it from Chris and Michael was to  
> arithmetically code ASTs such that no ill-formed tree could be  
> encoded. You could take a JPEG of the Mona Lisa, run it through the  
> decoder, and if it succeeded, get a (almost-certainly) nonsensical  
> yet syntactically well-formed AST. The encoding is fairly efficient,  
> not as good as optimized Huffman coding but close.
>
> This work was motivated by the sometimes bad (O(n^4)) complexity in  
> the Java bytecode verifier (or at least in early versions of it).
>
> My view is that there will never be a standardized bytecode  
> (politics look insuperable to me), and more: that there should not  
> be. Besides the conflicts among target VM technical details, and  
> ignoring latent IPR issues, I believe view-source capability is  
> essential. Even minification lets one pretty-print (http://jsbeautifier.org/ 
> ) and learn or diagnose.
>
> JS is still used in edit-shift-reload, crawl-walk-run development  
> style and part of this culture involves sharing. Of course no one  
> could mandate binary syntax to the exclusion of source, but a binary  
> syntax that did not allow pretty-printing would shove us all down  
> the slippery slope toward the opaque, closed-box world of Java  
> applets, Flash SWFs (modulo Flash+Flex's server-fetched view-source  
> capabilities), etc.
>
> Compression at the transport (session, whatever, the model is  
> climbing the traditional layering) is a separate issue.

Given the above, do you think there is a valid case to be made for a  
serialization format other than JavaScript source itself? It seems  
like anything binary is likely to have the same downsides as bytecode,  
and anything text-based enough to be truly readable and view-source  
compatible would be rather inefficient as a wire format (I would  
consider a JSON encoding with mysterious integers all over to be not  
truly view-source compatible). Thus I would propose that we should not  
define an alternate serialization at all.

(This is as considered separately from the possibility of  
programatically manipulating a parsed AST - the use cases for that are  
clear. Though there may still be verification issues depending on the  
nature of the manipulation API. It seems like the possibilities are  
either specialized objects that enforce validity on every individual  
manipulation, or something that accepts JSON-like objects and verifies  
validity after the fact, or something that accepts JSON-like objects  
and verifies validity by converting to JavaScript source code and then  
parsing it).

Regards,
Maciej



More information about the es-discuss mailing list