What is the status of Weak References?

Kevin Gadd kevin.gadd at gmail.com
Fri Feb 1 03:21:32 PST 2013


On Fri, Feb 1, 2013 at 2:06 AM, David Bruant <bruant.d at gmail.com> wrote:
> I don't understand the connection between the lack of weak references and
> emulating a heap in a typed array.

For an algorithm that needs weak references to be correct, the only
way to implement that algorithm in JavaScript is to stop using the JS
garbage collector and write your own collector. This is basically the
model used by Emscripten applications compiled from C++ to JS - you can use a
C++ weak reference type like boost::weak_ptr, but only because the
entire application heap is stored inside of a typed array and not
exposed to the JS garbage collector. This is great from the
perspective of wanting near-native performance, because there are JS
runtimes that can turn this into incredibly fast native assembly, but
the resulting code barely looks like JavaScript and has other
disadvantages, so that is why I bring it up - weakref support in JS
would make it possible to express these algorithms in hand-written,
readable, debuggable JS.

> Garbage collectors have evolved and cycles aren't an issue any longer, weak
> references or not.

Cycles are absolutely an issue, specifically because JS applications
can interact with systems that are not wholly managed by the garbage
collector. The problem in this case is a cycle being broken *too
early* because the application author has to manually break cycles. To
present a couple simple examples:

I have a top-level application object that manages lower-level 'mode'
objects representing screens in the application. The screens, when
constructed, attach event listeners to the application object. Because
the application manages modes, it needs to have a list of all the
active modes.
* The event handler closures can accidentally (or intentionally)
capture the mode object, creating a real cycle involving a dead mode
that will not be collected by even the most sophisticated GC.
* If I am not extremely cautious, when a mode is destroyed I might
forget (or fail) to remove its associated event handlers from the
event handler list, causing the event handler lists to grow over time
and eventually degrade the performance of the entire application.
* I have to explicitly decide when a mode has become dead and manually
break cycles between the mode and the application, while also cleaning
up any running code (or callbacks on pending operations) that rely on
the mode.
In this scenario, weak references are less essential but still
tremendously valuable: An event handler list containing weak
references would never form a cycle, and would continue to work
correctly as long as the mode is alive. It is also trivial to prune
'dead' event handlers from a list of weak event handlers. The need to
explicitly tag a mode as dead and break cycles (potentially breaking
ongoing async operations like an XHR) goes away because any ongoing
async operations will keep the object itself alive (even if it has
been removed from the mode list), allowing it to be eventually
collected when it is safe (because the GC can prove that it is safe).

I decide to build a simple pool allocator for some frequently used JS
objects, because JS object construction is slow. This is what
optimization guides recommend. I pull an object instance out of the
pool and use it for a while, and return it to the pool.
* If I forget to return an object to the pool when I'm done with it,
it gets collected and eventually the pool becomes empty.
* If I mistakenly return an object to the pool when it actually
escaped into a global variable, object attribute, or closure, now the
state of the object may get trampled over if it leaves the pool again
while it's still in use.
* If I mess up my pool management code I might return the same object
to the pool twice.
In this scenario, weak references would allow you to make the pool
implementation wholly automatic (though that would require the ability
to resurrect collected objects - I'm not necessarily arguing for that
feature). I should point out that this scenario is complicated by JS's
lack of an equivalent to RAII lifetime management in C++ and the
'using' block in C# (you can vaguely approximate it with try/finally
but doing so has many serious downsides) - given RAII or a 'using'
equivalent, you could manually ref-count pool entries instead of using
weakrefs. But I hope you can see the general gist here of solving a
problem the GC should be solving?

These examples are simplified but are both based on real world
applications I've personally worked on where the listed issues caused
us real grief - crashes and leaks from buggy manual lifetime
management, inferior performance, etc.

> I'm not part of TC39, but I'm largely opposed to anything that makes GC
> observable. It introduces a source of non-determinism; that is the kind of
> things that brings bugs that you observe in production, but unfortunately
> didn't notice and can't reproduce in development environment. Or if you
> observe them when running the program, you don't observe it in debugging
> mode.

My argument here is not that non-determinism is good. My argument is
that an application that runs non-deterministically in every web
browser (because it's a JavaScript application) is superior to an
application that deterministically doesn't run in any web browser
because the application cannot be expressed accurately in JS. It is
possible that the set of these applications is a small set, but it
certainly seems of considerable size to me because I encounter these
problems on a regular basis. The developers that I speak to who are
building these applications are being forced to choose Native Client
or Emscripten because their applications are not expressible in JS.

I'm personally developing a compiler that targets JS and the lack of
weak references (or RAII/'using') dramatically limits the set of
programs I can actually convert to JS because there are lots of
applications out there that simply need this functionality. If this is
something that can't be done in JS, or isn't possible until ES7/ES8, I
understand, but I would be very disappointed if the only reasons for
it are the hypothetical dread spectres of non-determinism and
information leaks.

Thanks,
-kg


More information about the es-discuss mailing list