WeakMap.prototype.clear performance (was: Questioning WeakMap.prototype.clear)
Mark S. Miller
erights at google.com
Tue Jan 22 13:53:38 PST 2013
On Tue, Jan 22, 2013 at 1:46 PM, Jason Orendorff
<jason.orendorff at gmail.com> wrote:
> On Tue, Jan 22, 2013 at 1:55 PM, David Bruant <bruant.d at gmail.com> wrote:
>> For reference, quote from Allen:
>> generational collectors can have large latencies between the time the last
>> reference to an object is destroyed and the when the GC actually notices.
>> Many GC cycles may occur during that period and if a populated but unneeded
>> large WeakMap is one of these zombie object, then it can have perf impacts.
>> Maybe even if we just have two generations. (To some extent, long-lived
>> ordinary Maps and Arrays also do this in a generational GC; but WeakMaps
>> have much, much worse worst-case GC performance.)
>> Indeed, the long-lived object being the only root of a big graph is a
>> problem unrelated to WeakMaps. If that was the only reason to add a clear
>> method on weakmaps, an Object.clear should be considered too.
> Not quite unrelated to WeakMaps, since this issue combines with the O(n^2)
> WeakMap GC algorithm to cause the performance problem. But otherwise, yes.
>> I don't understand the point about the worst-case GC performance. It may
>> be related to Allen's point about ephemeron algorithms which I know not
>> enough about.
>> I would be interested in knowing more if that's relevant. I'm not entirely
>> sure it's relevant since the difference between .clear and dropping a
>> weakmap is about the delta during which the storage is considered
> OK, deep dive follows...
> Think of our GC as a simple stop-the-world mark-and-sweep algorithm. (It's
> incremental, to reduce pause times, but that doesn't come into this
> During marking, when a WeakMap is marked, the keys and values are not
> marked. Instead the WeakMap is added to a list of marked (i.e.
> already-known-reachable) WeakMaps.
> 10. Once ordinary marking is done, we do a linear pass through all marked
> WeakMaps. For each WeakMap, for each entry, if the key is marked, we mark
> the value. If this pass does not find any previously-unmarked objects, we're
> done. But if we did, we must then resume ordinary marking in order to mark
> everything (transitively) reachable from those values. (Of course this step
> may mark additional WeakMaps that were not marked before.) Now goto 10.
> The maximum number of times the above loop may execute is the number of
> entries in WeakMaps. So it is an O(n^2) algorithm. The worst case occurs
> when WeakMap values lead to other WeakMaps or WeakMap keys, which lead to
> still more... I hope it's clear how this worst-case behavior can be
> triggered when, instead of clearing, an older WeakMap is dropped, to be
> collected, but possibly pointing to existing objects from which the newer
> WeakMap is reachable. If this happens N times, you could get a chain of N
> WeakMaps. Chaining is the problem. Note that in a generational GC, you would
> expect the *oldest* WeakMaps in the chain to be treated as roots.
> Easiest possible hack: detect this worst-case behavior, and flip a bit so
> that the system does only full GCs. It essentially stops acting like a
> generational GC. This is not a great hack, though. GC is still O(N^2); but N
> is bounded by the number of garbage WeakMaps you can end up with between GC
> I think the algorithm's asymptotic efficiency could be improved at the cost
> of adding a branch in the marking code (or, more likely—since branches in
> tight loops are expensive—compiling two versions of all the marking code,
> one for pre-WeakMap marking and one for post-WeakMap marking). That is a
> bigger implementation investment.
Yes. With this extra branch, you can use Felix's algorithm at
>> Having said all that, I bet we could hack around the worst-case GC
>> performance. It'll be a pain, but GC is like that sometimes.
>> What you said above about the current GC setup that yields equivalence
>> performance to .clear is interesting. In a nutshell, moving to a (naive?)
>> generational GC means that you're losing something you had before. I feel
>> there is a middleground to be found.
> What you're losing when you switch to a generational GC is precision. The
> tradeoff is: you do a lot less GC work, but you collect objects that survive
> a generation much less aggressively.
>> What about the following:
>> WeakMaps are allocated in their own area which is manually GC'ed with
>> today's algorithm (which is probably implemented for the last generation?).
>> This way, you'll know as soon as possible (next GC) if one is dead.
> I don't understand. How do you know if one is dead, short of marking the
> entire heap?
>> Given WeakMaps special properties about memory management, it doesn't
>> sound that crazy to special case how they're being used in GC algorithms.
>> Maybe the ideas I suggested above in 5 minutes are not perfect, but I feel
>> special-casing weakmaps is a direction to explore regardless of the debate
>> we're having about .clear actually since developers won't necessarily always
>> use it and the GC needs to be fast in these cases too.
> As I said earlier, I basically agree with that.
>> Just to be sure I understand generational GC: old generations considered
>> as roots when doing most GC traversing, right? That's why it may take time
>> to realize they're actually dead?
> Uh… I am going to screw this up, but I think basically if you're collecting
> generation k, all objects in generations ≤k that are referenced from
> generations >k are treated as roots.
Which way are you numbering your generations? Are generations >k
younger or older than k? (I ask because, if your answer is flipped
from what I expect, you are correctly describing "remembered sets".)
> So, yeah, basically.
> es-discuss mailing list
> es-discuss at mozilla.org
More information about the es-discuss