Spawn proposal strawman

Kris Kowal kris.kowal at cixar.com
Fri May 8 20:49:28 PDT 2009


On Wed, May 6, 2009 at 12:59 PM, Mark S. Miller <erights at google.com> wrote:
> On Wed, May 6, 2009 at 10:53 AM, Mark S. Miller <erights at google.com> wrote:
>> [...] For the Valija-like level, I
>> think the most important enabler would be some kind of hermetic eval
>> or spawn primitive for making a new global context (global object and
>> set of primordials) whose connection to the world outside itself is
>> under control of its spawner. With such a primitive, we would no
>> longer need to emulate inheritance and mutable globals per sandbox.

Before I dive into inline comments, I'll lay out some background for
the discussion.

There are now seven prototypes for the securable module proposal in
the server-side JavaScript group.  The spec is weak on implementation
details, so we're seeing a variety of ways to implement what is
effectively the "salty" or "transitional" require/exports system from
Ihab's and my proposal in January.  While hermetic eval is all we
"need", some of the implementations benefit from a more specialized
"module evaluator".

As was pointed out in Mountain View, modules must be "program"
constructions.  We also want module factory functions to be reusable
factory methods that accept "require" and "exports" (and "system" for
dependency injection, but that is an orthogonal matter and still a
subject of debate).

When we use hermetic eval, we coax it to return a module factory
function by injecting the text of the module into a function
constructor.

"(function (require, exports) {" + text + "/**/\n}"

This doesn't enforce the "program" construction, and some of the
JavaScript language semantics suffer for it.  For example, function
statements aren't defined and do not necessarily get bound to the
module scope in some implementations.  It also is "vulnerable" to
"injection" style attacks, like:

"}); /* run at load-time */ (function (require, exports) {"

Some of the implementations avoid these problems in various ways with
various side-effects.  GPSEE and Helma NG parse the module as a
program and save the AST in the module *loader's* memo.  Then,
"require" creates a fresh context and scope.  The scope contains
"require" and "exports" and is parented in the sandbox's primordial or
global object.  This has its own set of implications:

 * "this" is the module instance's unique scope object at the top level.
 * in function's that are called anonymously, is the module instance's
unique scope object, unless the interpreter is very recent and strict
(I'm vague on the details of this change to the language).
 * the primordials can only be accessed as free variables or as
members of "__proto__", if that's supported.
 * "require" and "exports" are accessible and mutable on the module
instance's unique scope object.

In my opinion, it would be ideal if:
 * "this" where "undefined" in the top scope.
 * "this" were "undefined" in functions that are called anonymously.
 * the primordials can only be accessed as free variables.
 * "require" and "exports" were only accessible as free variables.

At this stage, there are two kinds of module systems we can implement,
that vary in character.  They vaguely correspond to the character of
Cajita and Valija respectively (strict/simple/fast/easy vs
permissive).  The Valija loader has a unique tree of primordial
objects for each sandbox.  The Cajita loader only needs one, deeply
frozen primordial tree.  (Note: in my present server-side experiments,
there are two global trees: one is the original, thawed global object
in the bootstrapping context, with which I make the loader and
attenuate other authorities like file system access.  In this case,
there's an outer module system that generates its file API from its
ambient authority, and an inner module system where all authority
flows through the "system" free-variable or module).

Valija-style:
 * the transitive primordials can be monkey patched
 * sandboxes are expensive: not only do you need to create fresh
primordials for each sandbox, you cannot share module factory
functions among sandboxes since module factory functions closes on its
primordials.
 * matching the types of objects passed among sandboxes is hard.  A
lot of the time, this means that serializing and de-serializing
objects as JSON across sandbox membranes (much like worker threads)
will save a lot of frustration.  However, this would not be possible
for functions, which means no object capabilities.  With herculean
effort, we could use shadowing/proxy-objects among sandboxes, much
like we presently have to do between client and server for RPC.

Cajita-style:
 * for both good and ill, the primordials can not be monkey patched
 * sandboxes are cheap
 * objects can be shared among sandboxes

With the server-side JavaScript group, I've been working on one
prototype called Narwhal with Tom Robinson.  It runs on Rhino (with
Helma, GAE, or bare-bones) or V8 (with K7 or v8cgi) so far.  There's a
kernel loader that boot-straps itself using a "module evaluator"
function, a file reader function, a file existence tester function,
and a copy of the system environment object.  It uses the
kernel-loader to load up a sandbox module and its transitive
dependencies.  These dependencies include a platform-specific file API
module, which gets its authority to access the file system from
ambient authorities like the Java package name spaces or
dynamically-loaded FFI modules.  Then, it loads your application
module.

At this point, you have a module system and all the ambient authority
you normally have in JavaScript.  You can then elect to created
attenuated sandboxes.  This is a compromise.  It's my hope that the
value of having sandboxes will compel people to use them as often as
possible, and write modules that will work with their restrictions (no
monkey-patching of globals/primordials).  I think that Kevin Dangoor
and the ServerJS community are in agreement that the ServerJS standard
library, at least, will abide by the rules of the sandbox.

That being said, it will occasionally be necessary to use modules that
do not play well with these rules.  When that's the case, we could
still support Valija-style modules by providing an alternate interface
that creates new primordial trees for every sandbox and just suck up
the performance loss and communication complexity.


So, onward to the straw man:

>    eval.hermetic(program :(string | AST), thisValue, optBindings) :any

The most closely matching interface for the needs of Caja-style
modules would be:

    eval.hermetic(program:String)
        :Function(require:Function, exports:Object)
        :Undefined

So a module could be loaded:

   var factory = eval.hermetic(program);

And instantiated:

    let require = Require(baseId);
    let exports = {};
    memo[baseId] = exports;
    factory(require, exports);

As for the value of "this", i'd hope for "undefined" or the frozen
globals, but that poses a problem for the GPSEE/Helma-style
implementation.  If the top and only scope for a module is
"undefined", or even a frozen object, there's nowhere to put top-level
declarations.

There are a couple ways to dodge this issue: we could either beget the
top-level object from globals, or we could make the initial scope
chain contain both the frozen globals and an empty function closure
for module locals.  Both of these solutions are a stretch to implement
in present-day *Monkey, Rhino, and presumably all others (which, I
hope we agree is a red herring).  To address the former problem, the
engine would need to provide a program evaluator that distinguishes
the top-level scope from the "this" value passed to anonymous
functions.  To address the latter problem, the engine would need to
provide a program evaluator that begins with two scopes, one with
globals, the other for module locals, wherein top-level declarations
(and potentially free assignment, if that can't simply be culled from
modules) are bound to the locals instead of the globals.  I believe
this addresses issues with Mark's idea:

> eval.hermetic() does an indirect eval of the program in the current
> global context, but, as in
> <http://wiki.ecmascript.org/doku.php?id=strawman:lexical_scope>,
> without the global object at the bottom of the scope chain.

Implementations would need to decouple the top of the scope chain and
the global object.

We could fall back to a consistent value for "this", like a module
scope object or a singleton frozen global scope object (since there's
no need for more than one frozen global scope object).

For the Cajita style, there is no need to expose initGlobalObjects,
albeit "eval.spawn" to the user, since a single deeply-frozen global
scope object can be closed over by "eval.hermetic" and safely shared
among sandboxes.  There's also no need to add bindings to the global
object.

Then, addressing Valija-style:

>    eval.spawn(optBindings, optGlobalSubsetDirective :Opt(string)) :Sandbox

The most closely matching interface for the needs of Valija-style
modules would be:

    eval.hermetic(program:(String|AST), global:Object, local:Object)
    eval.spawn() :Object

I presume that, like Mark's optBindings, both global and local might
not be the exact objects provided in the sandbox, and that we might
deem it necessary to copy the [enumerable] members of those objects
into lexical scope frames to avoid prototype inheritance slip-ups.

So, a module could be run by:

    let require = Require(baseId);
    let exports = memo[baseId] = {};
    let sandbox = eval.spawn();
    evaluate(program, sandbox, {require, exports});

With the addition of a "eval.parse(text):AST", we could recover some
performance lost with this method, by sharing AST's among sandboxes.

Loading:

   var ast = eval.parse(program);
   programs[baseId] = ast;

Instantiating:

   let require = Require(baseId);
   let exports = memo[baseId] = {};
   let sandbox = eval.spawn();
   evaluate(programs[baseId], sandbox, {require, exports});

It might be desirable for spawn to provide both deeply frozen and
thawed global trees, or for hermetic to use its own deeply frozen
singleton if no scope is provided.

Once:

   var sandbox = eval.spawn(true);

Loading:

   var ast = eval.parse(program);
   programs[baseId] = ast;

Instantiating:

   let require = Require(baseId);
   let exports = memo[baseId] = {};
   evaluate(programs[baseId], sandbox, {require, exports});

>    interface Sandbox {
>      public getGlobal() :GlobalObject;
>      public deactivate(problem :string) :void;
>    }

I would hope that deactivation would be implicit by dropping all
references to the sandbox, apart from its internal references.

My present Sandbox objects are Functions that look like (please
forgive my pseudo-syntax):

interface Sandbox({loader:Loader, modules:Object, system:Object}) {
    public getSystem() :Object;
    public getLoader() :Loader;
    public invoke(mainId) :Object; // returns the exports
}

getGlobal() would be necessary for the Valija-style, and for executing
non-module code snippets in the sandbox.

I find that there's only ever need of one sandbox implementation.  It
works for both client-side, and server-side.  It works for secure and
insecure boxes.  The only thing that ever seems to vary for me is what
type to use for the module exports memo object (a usual Object, or
something more elaborate like a Map).

interface Loader {
    public resolve(id, [baseId]);
    public load(resolvedId);
}

That's the minimum interface for Loader as used by Sandbox.
Implementations generally have:

interface Loader {
    public resolve(id, [baseId]);
    public load(resolvedId); // fetch->evaluate->memoize->return
    public fetch(resolveId); // storage dependent, attenuated
    public evaluate(text, resolveId); // hermetic eval
    public reload(resolvedId); // for convenience
    public isLoaded() :Boolean; // convenience
    public canLoad(resolvedId) :Boolean; // for multi-loader searching
}

> * Unlike <http://wiki.ecmascript.org/doku.php?id=strawman:lexical_scope>,
> the 'this' value in scope at the top of the evaluated program is the
> provided thisValue; so a caller without access to their own global
> object cannot grant access they don't have. If the 'thisValue' is
> absent, then the global 'this' of the evaled program is undefined.

This is very important, and can be satisfied many ways.

> eval.spawn() makes a new global context -- a global object and a set
> of primordials. This new global context is subsidiary to the present
> one for (at least) deactivation and language subsetting purposes, so
> spawning forms a tree.

I'm not sure what these details about subsidization and deactivation
are meant to address.  I envision that the details would emerge from
the garbage collector.  If there are multiple garbage collectors with
disjointly managed but cross-referenced objects (like the DOM and JS
today)…well, just let me know.  I don't want to make a fuss about a
phantom, but I have thoughts about how that would be a problem and how
we could address it.

> * If optGlobalSubsetDirective is provided, then all code evaluated in
> this global context is constrained as if by an outer lexical use
> subset directive. Subset constraints compose across subsidiary spawns
> -- as if the optGlobalSubsetDirective of outer sandboxes were yet more
> outer lexical use subset directives.

So that sandboxes can become recursively more restrictive, but not
implicitly add objects they did not receive, I presume.  This would
apply to the Valija world, where you might want to pass the same
monkey patching you've done to yourself onto your children.  I doubt
this could be pulled off well, since it would involve not merely
enumerating the shallow properties, but merging the properties of the
entire tree.  I think it would be better if recursive sandboxes were
responsible for monkey patching themselves with the modules they have
access to, even though this would have a performance penalty.

> One then runs code hermetically within a sandbox by
>
>    sandbox.getGlobal().eval.hermetic(...)

Interesting.

> Given good catchalls and weak-key-tables, good membranes should be
> possible. Indeed, this should be a litmus test of catchall proposals.
> Given good membranes, one can easily gain other desired security
> properties by interposing membranes around some of the above objects.

Exciting.  I've definitely felt the need for weak references.  I'm not
sure why they're necessary for the module loader though, since
dropping a sandbox should be as simple (or complicated) as ditching
all references you possess to the sandbox or its contents.  Wes, Ihab,
and I discussed at length what it would mean if the module memo table
were weak, but resolved that it would be of little practical value and
that the motivating problem would be better solved with generation
garbage collection.

Summarily, I think that this straw man tackles too much complexity in
the line of appeasing monkey-patchers, but appears to do so well.
I've got a sneaking suspicion that monkey-patching globals has such
strong support that we must permit it.  If that's the case, I hope we
can provide a switch on the sandbox machinery so the application
programmer can chose whether they want light-weight sandboxes or heavy
ones.

Kris Kowal


More information about the es-discuss mailing list