Composition of Uncoordinated Working Sets of Modules

Kris Kowal kris.kowal at cixar.com
Fri Jun 4 16:03:18 PDT 2010


Simple Modules are, in their present state, one step forward and two
steps back from the previous generation of proposals.  With this
email, I intend to isolate these steps and propose a way to meet one
or two steps forward.

The one step forward comes from handling cyclic dependencies
elegantly.  If I am correct, this is the feature we gain from "second
classness" and from not basing the module system on a better "eval".
By being second class, this module system is able to internalize the
code needed to link the imports and exports of the "working set" of
modules.  The loader proposal reintroduces the idea of a "better
eval", being simply a hermetic evaluator that collects a working set
of modules, links them, and executes them.

Rather than rotating my original proposal to include this feature, I
will identify the features of my original proposal to which I'm
attached and propose how to rotate Simple Modules to accommodate them.


But, taking a few steps back, let's look at some use-cases from prior
art.


Java's packages and the present Simple Modules proposal share a
particular feature.  I call this "autonomous modules": modules that
are "self-named": modules that include their fully qualified-name in
their source code.  Rhino is a Java package that contains parsers and
interpreters for JavaScript.  Dojo's ShrinkSafe and the YUI Compressor
use the parser components from Rhino, perform transformations on the
token stream, and re-print the resultant token stream to produce a
"minified" version of the original.  For expedience, these projects
forked Rhino instead of refactoring it to accommodate their need.

That was their mistake, but our problem.

Because all of the Rhino codebase contains fully qualified names in
every file, refactoring Rhino to contain and link against alternate
names is onerous, and alternately creating a parallel universe for the
minifier fork is onerous, so these things are simply not done.  As a
result, it is not possible (or perhaps merely egregiously
inconvenient) to compose a pure-Rhino system with either YUI
Compressor or Dojo ShrinkSafe, much less all three at once.

Python's module loader solves this problem by reducing the coupling
between modules and their names.  It is possible in Python to express
both relative and top-level module identifiers for the purpose of
linking, and a module's own name is never expressed in code.

Relative module identifiers are used to link within coherent
(internally consistent, designed in coordination) sets of modules,
usually stored in the same hierarchy.  Top-level module identifiers
are used to link across "packages".

Python has a few weaknesses that CommonJS modules address.

1.) Least importantly, it is non-trivial for a module to discover its
own top-level identifier.

2.) Before version 2.6, it was not possible to explicitly distinguish
a relative module identifier from a top-level module identifier.  If
you imported module X, Python would first look for that module
relatively (in the same directory) and then look for it at the
top-level (specifically, in the first of the paths in the module
search path that contained a directory that matched the name of the
first term of the module identifier).  This conflated the relative and
top-level name spaces, such that if you gave a module the same name as
any of the names used at the top-level, you would not be able to
access the module of the same name in the top level from that
directory.  For example, it is not possible to import the top-level
"csv" module from within the "my.formats.csv" module, because its own
module would intercept "csv".  This is a problem we solved in CommonJS
by requiring relative module identifiers and top-level identifiers to
be explicitly distinguished with "." or ".." in their first term.
This is also the reason why we used "/" to delimit terms instead of
dots.  Python 2.6 and 3 introduce a similar notation, from which we
draw our inspiration, with prefix dots, but the solution is crippled
for reasons beyond the scope of this discussion.

3.) The top-level module name space is centrally managed.  In Python,
the "global name space problem" is deferred once by separating each
file into a scope and moving the global name space to top-level
identifiers.  This means that there exist hazards of coordination when
composing packages.  In the context of Python packages, collisions at
this level are reasonably improbable.  CommonJS defers the global name
space problem similarly, moving the global name space out to top-level
module identifiers.  However, the server-side JavaScript community is
every bit as fragmented as the client-side JavaScript community, which
is to say that there are several separate land-grabs in progress for
the best top-level identifiers.  The package mappings proposal [1]
pushes the global name space problem out to URLs where it belongs.


It's easy to take pot-shots at having a three-layer system, but at
each of these layers, you get to balance brevity and sovereignty, and
I think three layers are what you need.  A module is generally the
size of a chunk of code a single person can keep in their head.  A
package is generally the size of a chunk of code that can be
coordinated by a team.  The web is the size of a chunk of code that
the world can collectively manage.  At the module scope, you use
variables to reference internally, and short module names to reference
externally.  At package scope you use module names to link internally,
and URLs to link externally.  With modules and lexical scoping, you
get sovereignty of the variable names in your scope.  With package
mappings, you each package gets sovereignty over its internal module
name space.

The original Simple Modules proposal was only sufficient in the small.
The Loaders proposal addresses the large.  It gives "working sets of
internally linked modules" sovereignty of their module name space,
which is good.  It does not yet enable linking to other working sets
of internally consistent modules, wherein the composition problem
lies.  I propose the following revisions:

A.) Bifurcate the module name space between internal and external
linkage.

    import "foo"; // external
    import "./bar"; // internal

B.) Support hierarchical nesting of internal modules with relative
module identifiers.

    foo.js
    foo/
        bar.js
        baz.js

C.) Separate the name from the module declaration syntax; make a file
a module production, and provide the means of creating anonymous
modules and giving them to loaders.  Make anonymous modules
uninstantiable without the assistance of the module "linker".  Permit
the module linker to process whole files as module bodies with a an
externally assigned name.  This would allow us to decouple fetching
and bundling, permitting a variety of patterns there.

    linker.set("foo/bar", module {
        import ./baz/*;
    });

D.) Add something like package mappings to the loader, so a working
set of internally consistent modules can reference an external working
set of internally consistent modules managed by another loader,
recursively.

    var other = Linker();
    other.set("bar", module {
        export a = 10;
    });
    var self = Linker();
    self.set("foo", other);
    self.set("main", module {
        import "foo/bar".{a};
        assert.equal(a, 10);
    });
    self.execute("main");

BONUS.) Allow the user of a module loader to instantiate the working
set of modules with a controlled set of free variables available to
all modules.  This would allow us to contrive environments that smell
like a previous script left them some global variables.  This would
greatly assist migration, and permit new dependency injection forms.

    linker.execute("foo/bar", {
        "assert": assert
    });

BONUS BONUS.) Provide an API on the linker that assists developers in
constructing bundles of the minimal working set of transitive
dependencies from a particular starting module.

    linker.dependencies("foo/bar");

Another feature of Simple Modules is that it preserves the
"equivalence by concatenation" property of existing "script" tags,
while liberating the scripts from being sensitive to the order in
which they are concatenated.  This is in conflict with the goal of
removing autonomous module blocks.

The principle value of being able to concatenate scripts is that it
can reduce the "chattiness" of the interaction between the client and
server, which over long-latency HTTP connections, which in turn
reduces load times.

CSS can be concatenated.  Images can be sprited.  Scripts can be
concatenated.  All of these solutions for improving performance are
based on an imperfect world where downloads are initiated in the order
that they are discovered, which is itself tied down to the order in
which they appear in the layout.  There are two major solutions to
this problem that would eliminate the need for bundling and
concatenation.  One of them is Alexander Limi's resource package
proposal [2] and the other is Google's SPDY [3].

Alexander Limi proposes that a link tag with a relationship of
"package" could be attached to a subtree of the URL space, permitting
an archive to be downloaded before the resources are mentioned in
source.  This drops a bomb on the concatenation solution and decouples
the load order from the layout order, since archives can be unpacked
in stream, all with a progressive enhancement that would permit
production and debugging to use mostly the same code, and permit older
browsers to do business as usual with individual files.

SPDY allows the server and client to prioritize content intelligently
in a layer between TCP and HTTP.

The technique of concatenation may be an anachronism by the time web
developers are willing to publish Harmony modules to general web
users.

However, it would still be good default behavior for a web page to
construct a "working set" / "loader" / "linker" for a web page that is
backed by modules fetched individually over HTTP and executed when it
is possible to link the working set.  Then, using reflective "Loader"
or "Linker" API, it would be possible to create and use optimized
bundles.  Furthermore, package mappings could be accomplished if
browsers provided a URL Linker/Loader that would automatically fetch
and link modules on a particular URL tree.


In summary, the problems worth solving include:

a.) balancing linkage brevity and uniqueness, with the goal of
    offloading the global name space problem to DNS, providing
    reliable sovereignty over name spaces controlled by:
     * the developer of a single file
     * the developer of a tree of files
     * domain owners
     * IANA
b.) elimination of accidental global variables
c.) the manual explication of transitive dependencies
d.) the manual linearization of execution and linkage
e.) mutual dependency
f.) the elimination of the need for build steps during development and
    debugging.
g.) decoupling the utterance of dependencies from the order and timing
    in which dependencies are transported in production.
h.) isolation of scopes
i.) isolation of internally consistent modules
j.) reliable linkage to independently develop, internally consistent
    working sets of modules

Simple Modules will assist individual designers of coherent groups of
name spaces for the purpose of producing single internally consistent
applications and APIs.

Simple Modules, at present, will not sufficiently assist people
constructing applications and APIs by composing non-coherent groups of
name spaces produced by non-cooperating groups of developers.


In any case, that's my two bucks,
Kris Kowal


[1] http://wiki.commonjs.org/wiki/Packages/Mappings/B
[2] http://limi.net/articles/resource-packages/
[3] http://www.chromium.org/spdy/spdy-whitepaper


More information about the es-discuss mailing list