WeakMap.prototype.clear performance (was: Questioning WeakMap.prototype.clear)

Jason Orendorff jason.orendorff at gmail.com
Tue Jan 22 13:46:12 PST 2013


On Tue, Jan 22, 2013 at 1:55 PM, David Bruant <bruant.d at gmail.com> wrote:

> For reference, quote from Allen:
>
> generational collectors can have large latencies between the time the last
> reference to an object is destroyed and the when the GC actually notices.
> Many GC cycles may occur during that period and if a populated but unneeded
> large WeakMap is one of these zombie object, then it can have perf impacts.
>
> Jason:
>
>   Maybe even if we just have two generations. (To some extent, long-lived
> ordinary Maps and Arrays also do this in a generational GC; but WeakMaps
> have much, much worse worst-case GC performance.)
>
> Indeed, the long-lived object being the only root of a big graph is a
> problem unrelated to WeakMaps. If that was the only reason to add a clear
> method on weakmaps, an Object.clear should be considered too.
>

Not quite unrelated to WeakMaps, since this issue combines with the O(n^2)
WeakMap GC algorithm to cause the performance problem. But otherwise, yes.

I don't understand the point about the worst-case GC performance. It may be
> related to Allen's point about ephemeron algorithms which I know not enough
> about.
> I would be interested in knowing more if that's relevant. I'm not entirely
> sure it's relevant since the difference between .clear and dropping a
> weakmap is about the delta during which the storage is considered
> collectable.
>

OK, deep dive follows...

Think of our GC as a simple stop-the-world mark-and-sweep algorithm. (It's
incremental, to reduce pause times, but that doesn't come into this
conversation.)

During marking, when a WeakMap is marked, the keys and values are not
marked. Instead the WeakMap is added to a list of marked (i.e.
already-known-reachable) WeakMaps.

10. Once ordinary marking is done, we do a linear pass through all marked
WeakMaps. For each WeakMap, for each entry, if the key is marked, we mark
the value. If this pass does not find any previously-unmarked objects,
we're done. But if we did, we must then resume ordinary marking in order to
mark everything (transitively) reachable from those values. (Of course this
step may mark additional WeakMaps that were not marked before.) Now goto 10.

The maximum number of times the above loop may execute is the number of
entries in WeakMaps. So it is an O(n^2) algorithm. The worst case occurs
when WeakMap values lead to other WeakMaps or WeakMap keys, which lead to
still more... I hope it's clear how this worst-case behavior can be
triggered when, instead of clearing, an older WeakMap is dropped, to be
collected, but possibly pointing to existing objects from which the newer
WeakMap is reachable. If this happens N times, you could get a chain of N
WeakMaps. Chaining is the problem. Note that in a generational GC, you
would expect the *oldest* WeakMaps in the chain to be treated as roots.

Easiest possible hack: detect this worst-case behavior, and flip a bit so
that the system does only full GCs. It essentially stops acting like a
generational GC. This is not a great hack, though. GC is still O(N^2); but
N is bounded by the number of garbage WeakMaps you can end up with between
GC cycles.

I think the algorithm's asymptotic efficiency could be improved at the cost
of adding a branch in the marking code (or, more likely—since branches in
tight loops are expensive—compiling two versions of all the marking code,
one for pre-WeakMap marking and one for post-WeakMap marking). That is a
bigger implementation investment.


>    Having said all that, I bet we could hack around the worst-case GC
> performance. It'll be a pain, but GC is like that sometimes.
>
> What you said above about the current GC setup that yields equivalence
> performance to .clear is interesting. In a nutshell, moving to a (naive?)
> generational GC means that you're losing something you had before. I feel
> there is a middleground to be found.
>

What you're losing when you switch to a generational GC is precision. The
tradeoff is: you do a lot less GC work, but you collect objects that
survive a generation much less aggressively.


> What about the following:
> WeakMaps are allocated in their own area which is manually GC'ed with
> today's algorithm (which is probably implemented for the last generation?).
> This way, you'll know as soon as possible (next GC) if one is dead.
>

I don't understand. How do you know if one is dead, short of marking the
entire heap?

Given WeakMaps special properties about memory management, it doesn't sound
> that crazy to special case how they're being used in GC algorithms.
> Maybe the ideas I suggested above in 5 minutes are not perfect, but I feel
> special-casing weakmaps is a direction to explore regardless of the debate
> we're having about .clear actually since developers won't necessarily
> always use it and the GC needs to be fast in these cases too.
>

As I said earlier, I basically agree with that.

Just to be sure I understand generational GC: old generations considered as
> roots when doing most GC traversing, right? That's why it may take time to
> realize they're actually dead?
>

Uh… I am going to screw this up, but I think basically if you're collecting
generation k, all objects in generations ≤k that are referenced from
generations >k are treated as roots.

So, yeah, basically.

-j
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130122/723d5cb9/attachment-0001.html>


More information about the es-discuss mailing list