Module-based parallel JS strawman

Isiah Meadows isiahmeadows at gmail.com
Sun Nov 6 05:57:22 UTC 2016


TL;DR: I'm proposing a completely different threading model,
supporting lightweight and cooperative multitasking while allowing
synchronous, thread-safe communication and inspection. Apologies for
the length.

---------------

Admittedly, this is a significant departure from the original
JavaScript concurrency model, web workers, that's implemented in
browsers. But I feel the lower level, more tightly integrated nature
of it will make it far faster and lighter in practice, while still
avoiding some significant footguns and still working with the
traditional single-threaded nature of JavaScript.

So here's my idea: add an opt-in, low-level threading mechanism that
also includes several safety measures for both security and safety, to
leverage the evented nature of JavaScript as well as its module
system, and to remain as light as possible to make the multithreaded
experience still feel native and intuitive, not bolt-on.

For more details, see my gist, which is much more complete than this:
https://gist.github.com/isiahmeadows/a01799b4dc9019c55dfcd809450afd24

---------------

First, add an `import.fork(module)` method-like expression, down the
vein of Domenic's `import(module)` proposal, and to a lesser extent,
the `super()` pseudo-method. This expression returns a promise to a
thread instance. The thread instance will have the following
properties:

- `thread.module` - This is the resulting imported module.
- `thread.global` - The global instance for the thread.
- `thread.close()` - Close the thread. This returns a promise resolved
when it's done.
- `thread.onClose(handler)` - Register a handler for synchronous
cleanup, triggered when the thread is closed (but after any remaining
atomic calls).
- `thread.active` - Whether this thread is currently active (i.e. not closed).
- `thread.closing` - Whether this thread is currently closing, but not
fully closed.
- `thread.id` - A unique integer ID for this thread.

The instantiated module is literally the exported module itself, with
minimal overhead.

The instantiated module is literally the exported module itself, with
minimal overhead. It's loaded inside its own separate realm. In the
child, there is a global `currentThread` object with two methods:

- `currentThread.close()` - Close the currently running thread after
the current tick ends. Or, in other words, make this tick the last.

- `currentThread.onClose(handler)` - Register a handler for
synchronous cleanup, triggered when the thread is closed (but after
any remaining atomic calls).
- `currentThread.active` - Whether the current thread is active (i.e.
the thread's event loop is still alive).
- `currentThread.closing` - Whether the current thread is currently
closing, but ticks are still being processed.

---------------

Now there are several safety mechanisms in place to avoid numerous
race conditions, and to maintain overall safety of the language.

- There's now a concept of atomic objects and thread contexts. If an
atomic object was created in your thread's context, you may modify it
as you wish. If it was created in another thread's context, you can
only read it, and attempting to modify it in any way causes a
TypeError to be thrown. Consider it a special proxy of sorts.

  - Modules and many global objects are atomic objects, including
`Object` and `Object.prototype` themselves, as well as
`SharedArrayBuffer` and its prototype.
  - Atomic objects use a new keyword called `atomic`. Atomic object
literals are created with `atomic {foo: 1}`, atomic functions are
created with `atomic function foo() {}`, etc.

- There are two key restrictions/invariants with atomic objects, and
attempts to violate them will result in a TypeError:

  1. Atomic objects may only reference other atomic objects, including
from their prototype.
  2. Methods defined on atomic classes and shorthand methods on atomic
object literals are themselves implicitly atomic.

- There will be two new methods for dealing with atomic objects:
`Object.isAtomic(object)` to detect atomic objects and
`Object.isFromThread(object)` to detect whether an object is from the
current thread.

- Only atomic objects may escape their thread context, and they
themselves are read-only outside it (modifications result in type
errors). This is a very key safety layer to allow most current
optimizations to persist, and it's also to mostly prevent a whole
class of bugs.

- When a thread dies, its event loop will stop scheduling tasks,
finishing those which were already scheduled. After it eventually
dries up, the event loop is closed, the OS thread exits, and the
associated data is eligible for GC. The thread context survives thread
death, because it's on the heap shared by all threads.

- Atomic function and class declarations are immutable, to enable some
optimizations stated below. Additionally, their instances and
prototypes are both atomic, as sugar to avoid a ton of repetition.

- Calling atomic functions from the same thread context requires no
lock, and is a normal function call.

- Calling atomic functions, atomic getters, or atomic setters from a
different thread context waits for the callee's context to finish its
current *microtask* (not tick) before running the call. This avoids
numerous race conditions with less performance hit by requiring
function-level locks, and it allows atomic functions to wrap
non-atomic things like context variables.

- Constant property accesses on atomic objects (e.g. known
non-configurable non-writable, `const` module exports) are done with
no lock.

- Potentially mutable property accesses on atomic objects (i.e.
everything else) are synchronized in the same way functions are.

- The cross-thread scheduling works much like an event loop
interleaved within another event loop. There's a thread task queue
that is executed with higher priority over the main event loop. The
execution is specified in a way that requires no actual locks (the
threads *wait* for everything to run instead).

- Each thread runs in its own realm, but the ability to expose direct
instances makes sharing far easier to do, with a more intuitive
interface. Modules are cached per-thread, though.

---------------

In the gist, I do address several performance and security concerns as
well. Some of them are easily mitigated (simultaneous variable
read/write is impossible by design) and others some are more difficult
to mitigate (concurrent heap modification), but some are inherent to
parallelism itself (SharedArrayBuffer-based cache timing attacks are
already possible with Web Workers).


More information about the es-discuss mailing list