What is the status of Weak References?

David Bruant bruant.d at gmail.com
Fri Feb 1 05:43:27 PST 2013


Le 01/02/2013 12:21, Kevin Gadd a écrit :
> On Fri, Feb 1, 2013 at 2:06 AM, David Bruant <bruant.d at gmail.com> wrote:
>> I don't understand the connection between the lack of weak references and
>> emulating a heap in a typed array.
> For an algorithm that needs weak references to be correct, the only
> way to implement that algorithm in JavaScript is to stop using the JS
> garbage collector and write your own collector. This is basically the
> model used by Emscripten applications compiled from C++ to JS - you can use a
> C++ weak reference type like boost::weak_ptr, but only because the
> entire application heap is stored inside of a typed array and not
> exposed to the JS garbage collector. This is great from the
> perspective of wanting near-native performance, because there are JS
> runtimes that can turn this into incredibly fast native assembly, but
> the resulting code barely looks like JavaScript and has other
> disadvantages, so that is why I bring it up - weakref support in JS
> would make it possible to express these algorithms in hand-written,
> readable, debuggable JS.
Sorry for repeating myself, but I still don't see the connection between 
the lack of weak references and emulating a heap in a typed array. 
Phrased as a question:
Would it be possible to compile a C++ program in JS with weakrefs 
without emulating a heap in a typed array? Because of pointer 
arithmetics, I doubt it, but I'm curious to learn if that's the case.

>> Garbage collectors have evolved and cycles aren't an issue any longer, weak
>> references or not.
> Cycles are absolutely an issue, specifically because JS applications
> can interact with systems that are not wholly managed by the garbage
> collector. The problem in this case is a cycle being broken *too
> early* because the application author has to manually break cycles. To
> present a couple simple examples:
>
> I have a top-level application object that manages lower-level 'mode'
> objects representing screens in the application. The screens, when
> constructed, attach event listeners to the application object. Because
> the application manages modes, it needs to have a list of all the
> active modes.
> * The event handler closures can accidentally (or intentionally)
Last I heard, it's very difficult to accidentally capture a reference in 
a closure because modern engines check which objects are actually used 
(looking at variable names), so for an object to be captured in a 
closure, it has to be used. So "intentionally".

> capture the mode object, creating a real cycle involving a dead mode
> that will not be collected by even the most sophisticated GC.
The problem is not about cycles. It's about abusively holding references 
to objects.

> * If I am not extremely cautious, when a mode is destroyed I might
> forget (or fail) to remove its associated event handlers from the
> event handler list, causing the event handler lists to grow over time
> and eventually degrade the performance of the entire application.
> * I have to explicitly decide when a mode has become dead
Yes. I would say "understand" rather than "decide", but yes. And that's 
a very important point that most developers ignore or forget. GC is an 
undecidable problem, meaning that there will always be cases where a 
human being needs to figure out when in the object lifecycle it is not 
longer needed and either free it in languages where that's possible or 
make it collectable in languages with a GC. There will be such cases 
even in languages where there are weak references.
Nowadays, making an object collectable means cutting all references 
(even if the object is not involved in a cycle!) that the mark-and-sweep 
algorithm (as far as I know, all modern engines use this algorithm) 
would traverse.


> In this scenario, weak references are less essential but still
> tremendously valuable: An event handler list containing weak
> references would never form a cycle, and would continue to work
> correctly as long as the mode is alive. It is also trivial to prune
> 'dead' event handlers from a list of weak event handlers.
When does the GC decide to prune dead event handlers? randomly? Or maybe 
when you've performed some action meaning that the corresponding mode is 
dead?

> The need to
> explicitly tag a mode as dead and break cycles (potentially breaking
> ongoing async operations like an XHR) goes away because any ongoing
> async operations will keep the object itself alive (even if it has
> been removed from the mode list), allowing it to be eventually
> collected when it is safe (because the GC can prove that it is safe).
>
> I decide to build a simple pool allocator for some frequently used JS
> objects, because JS object construction is slow. This is what
> optimization guides recommend.
Are these guides aware of bump allocators? or that keeping objects alive 
more than they should pressures generational garbage collectors?

> I pull an object instance out of the
> pool and use it for a while, and return it to the pool.
> * If I forget to return an object to the pool when I'm done with it,
> it gets collected and eventually the pool becomes empty.
> * If I mistakenly return an object to the pool when it actually
> escaped into a global variable, object attribute, or closure, now the
> state of the object may get trampled over if it leaves the pool again
> while it's still in use.
> * If I mess up my pool management code I might return the same object
> to the pool twice.
I'm sorry, but all your examples are "if I forget, if i make a 
mistake...". I don't think making bugs are a good justification to add 
new features in a language. If you really care about memory, make your 
algorithms right, spend the necessary time to understand the lifecycle 
of your own objects to understand when to release them.

> In this scenario, weak references would allow you to make the pool
> implementation wholly automatic (though that would require the ability
> to resurrect collected objects - I'm not necessarily arguing for that
> feature). I should point out that this scenario is complicated by JS's
> lack of an equivalent to RAII lifetime management in C++ and the
> 'using' block in C# (you can vaguely approximate it with try/finally
> but doing so has many serious downsides) - given RAII or a 'using'
> equivalent, you could manually ref-count pool entries instead of using
> weakrefs. But I hope you can see the general gist here of solving a
> problem the GC should be solving?
>
> These examples are simplified but are both based on real world
> applications I've personally worked on where the listed issues caused
> us real grief - crashes and leaks from buggy manual lifetime
> management, inferior performance, etc.
>
>> I'm not part of TC39, but I'm largely opposed to anything that makes GC
>> observable. It introduces a source of non-determinism; that is the kind of
>> things that brings bugs that you observe in production, but unfortunately
>> didn't notice and can't reproduce in development environment. Or if you
>> observe them when running the program, you don't observe it in debugging
>> mode.
> My argument here is not that non-determinism is good. My argument is
> that an application that runs non-deterministically in every web
> browser (because it's a JavaScript application) is superior to an
> application that deterministically doesn't run in any web browser
> because the application cannot be expressed accurately in JS.
:-) Interesting argument.

> It is
> possible that the set of these applications is a small set, but it
> certainly seems of considerable size to me because I encounter these
> problems on a regular basis. The developers that I speak to who are
> building these applications are being forced to choose Native Client
> or Emscripten because their applications are not expressible in JS.
I don't know enough languages to tell, but I wonder until which point 
should JS import other language features for the sake of porting programs.
Where are the JS equivalent of Scala actors? There are probably some 
very interesting Scala programs to port to the web?

> I'm personally developing a compiler that targets JS and the lack of
> weak references (or RAII/'using') dramatically limits the set of
> programs I can actually convert to JS because there are lots of
> applications out there that simply need this functionality.
ES6 introduces revokable proxies [1] which could be used to implement as 
"explicit weakrefs" (you need to say explicitely when you don't want to 
use an object anymore).
One idea would be to add some source annotations to tell at a coarse 
level when some object is guaranteed to be not needed anymore. It would 
compile to revoking the proxy.

> If this is
> something that can't be done in JS, or isn't possible until ES7/ES8, I
> understand, but I would be very disappointed if the only reasons for
> it are the hypothetical dread spectres of non-determinism and
> information leaks.
Each of these reasons seems to be valid to me.

David

[1] http://wiki.ecmascript.org/doku.php?id=strawman:revokable_proxies


More information about the es-discuss mailing list