Code compilation alternative to Function()

Sean Silva silvas at purdue.edu
Sat Jan 11 21:48:10 PST 2014


On Sat, Jan 11, 2014 at 7:34 PM, <fixplzsecrets at gmail.com> wrote:

>
> I want to propose an idea and get feedback on viability of this or some
> related addition to the language.
>
> ES should have a way to build functions at runtime - like the approach of
> building a string of code and using Function(code) - but using API calls to
> assemble the code. Special support for this from language implementations
> would allow programs that depend on dynamic evaluation of large blocks of
> code to execute with less latency - bypassing the cost of concatenating a
> large string and then requesting the runtime to parse it character by
> character as with the use of Function().
>

Can you provide some measurements where concatenating a string and parsing
it is a bottleneck?


>
> Here is my naive guess at what the API should look like:
>
>     func  = CompiledFunction.create()
>     arg0  = func.arg(0)
>     val   = func.get(arg0)
>     val2  = func.op('*', val, func.literal(2))
>     func.return(val2)
>     result = func.done()
>
> This builds "function(a){ return a * 2 }"
>
> As you can see, this API provides a LLVM-like language.
>

LLVM-dev here. Other than this being an IR, I don't see any resemblance to
LLVM at all.



> There would be more methods: "get", "set" for reading and writing
> variables and fields, "call" for function calls, "startIfElse" for writing
> if/else blocks, "startFor", "startWhile" for loops, "break", "continue",
> and others for every possible construct.
>

This is nothing like LLVM at all (not SSA).



>
> It is important that runtime implementations support these operations with
> low overhead. Ideally it should be a nearly direct means of writing to the
> intermediate representation format of the runtime.
>
>
> Motivation:
> There are a few types of programs that use generated Javascript in
> browsers:
>
> - PEG.js
> PEG.js is a library for facilitating parsing of custom text formats. It
> works by converting a parser notation to Javascript code that performs the
> specified parsing rules.
> Ability to use custom data formats is a piece of Unix philosophy that it
> seems should be supported on the Web. As is PEG.js has to depend on the
> Function() quirk, which may not work if content security policy is enabled.
>
> - Opal http://opalrb.org/try/ &c
> Opal aims to implement Ruby running in Javascript environments. It is
> among other projects with similar aims for different languages operating in
> different ways.
> If special support for this API is available, web applications depending
> on tools like this could deliver their code and initialize with lower
> latency.
>
> - Asm.js
> Asm.js is a special format of Javascript suitable for representing native
> executable code. However, one complaint concerning it is that it is not a
> suitable lexical representation of the content it encodes - Asm.js programs
> are larger than equivalent native executables, and parsing time is
> proportional to this size because content has to be scanned character by
> character and code size in present experiments can exceed 10 megabytes or
> more, and parsing can be inefficient unless the runtime implementation
> includes a specialized parser for this format. Because of this, I have seen
> criticism suggesting that proponents of Asm.js should design a bytecode
> format for this use case instead of sending blobs of Javascript.
>
> But defining an adequate future-proof bytecode format ahead of time is
> difficult, and is almost a separate concern to what Asm.js is concerned
> with.
>
>
> What if independent parties could design their own program representation
> format that can be converted to executable form as needed?
>
> A party delivering a web application can choose a format for their Asm.js
> code, and include a decoder for it in their application startup procedure.
> If the runtime receiving this application supports the code creation API,
> the decoder can bypass the usual process of decompressing megabytes of
> Javascript code and then having to parse it.
>

I'm not convinced that decompressing megabytes of JavaScript and parsing it
(all in native code) is going to be any slower than having to JIT a
JavaScript code generation routine, build up some horribly cache-unfriendly
"sea of GC-heap-allocated linked nodes in JS" AST structure, then convert
that "sea of linked nodes in JS" to the JIT's internal program
representation. Decompress+parse is already a highly optimized code path as
it is on the critical path of virtually every page load.



>
> To illustrate what I mean:
> Eval example:
>
>     load("./blob.js", function(input) {
>       var ts = {}
>       ts.start = Date.now()
>
>       Function(input)()
>       MyBlob.foo
>
>       ts.end = Date.now()
>       console.log("Eval time", ts.end - ts.start)
>     })
>
> Codegen example:
>
>     load("./blob.js", function(input) {
>       var ast = esprima.parse(input)
>
>       var ts = {}
>       ts.start = Date.now()
>
>       Function(escodegen.generate(ast))()
>       MyBlob.foo
>
>       ts.end = Date.now()
>       console.log("Codegen time", ts.end - ts.start)
>     })
>
> In the second example, the line "Function(escodegen.generate(ast))()" is
> less efficient and takes longer to run, even though we started with the AST
> loaded in memory, which is a form more suitable to being converted to an
> executable function.
>

Unless the program representation used internally by the JS engines is
exactly the same as the standardized representation (it isn't, and I doubt
JS engines would let that implementation detail be standardized), then they
still need to do a format conversion, but this time walking a linked sea of
GC-heap allocated JS objects instead of doing a linear scan of the program
text.

Can you provide some measurements about the relative sizes in memory of the
AST structure that you are using vs. the raw code string? I'm almost
positive that keeping the AST structure in JS is way too cache-unfriendly
to be a performance win (not to mention GC).

-- Sean Silva


>
> What if we could make the second example execute faster than the first one?
>
> I think this seemingly cosmetic capability would give web programs a
> degree of freedom that could end up solving some design problems.
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20140112/b9f9ab9f/attachment-0001.html>


More information about the es-discuss mailing list