WeakMap GC performance (was: WeakMap.prototype.clear performance)

David Bruant bruant.d at gmail.com
Wed Jan 23 01:49:17 PST 2013


[reordering]
Allen wrote:
> We can understand the value of providing a clear method without talking about GC at all.
I don't doubt there is a case to clear a data structure, but it can be 
filled with clearless weakmaps. What I'm trying to find is a 
differentiating factor. I agree that:
* clearable and clear-less weakmaps both have a use. Which is dominant 
for developers has yet to be determined and only tastes and feelings 
have been provided so far (including by myself).
* clearable weakmaps and clear-less weakmap can be symmetrically and at 
close to no cost implemented on top of one another.

Until evidence (from other languages?) is provided that one case matters 
more, I personally call this a tie. That's where my reflection is at.

I think a major remaining point is performance. If clear-less weakmaps 
induce an incompressible significant GC cost, then, that is a valid 
justification to have native .clear.
Now, implementors will have to deal with programs where some long-lived 
weakmaps aren't manually cleared, the interesting question here is: how 
far can they go to reduce the GC cost (without requiring a major 
breakthrough in GC research of course ;-) )?
If the cost can be reduced to a marginal difference with manual .clear, 
I call the performance argument a tie too (leaving the debate to a 
taste/feeling debate)


Le 23/01/2013 00:36, Allen Wirfs-Brock a écrit :
> On Jan 22, 2013, at 2:35 PM, David Bruant wrote:
>> So, to find out if a weakmap is dead, it has to come from another source than the mark-and-sweep algorithm (since it losts its precision)...
>> Given the additional prohibitive cost weakmaps seem to have on the GC, maybe things that would otherwise be considered too costly could make sense to be applied specifically to WeakMaps. For instance, would the cost of reference-counting only weakmaps be worth the benefit from knowing early that the weakmap is dead? (I have no idea how much each costs, so it's hard for me to compare the costs)
>> For WeakMapWithClear, reference counting would declare the weakmap dead as soon as the new weakmap is assigned to the private property so that's good. It wouldn't work if some weakmaps are part of a cycle of course... but maybe that it's such an edge case that it's acceptable to ask users doing that to break their weakmaps cycle manually if they don't want the GC not to be too mad at them.
>>
> You know, as much as Jason and I enjoy talking about garbage collectors, this probably isn't the place to revisit the last 40 years of a highly developed area of specialized CS technology.
Even if there is a .clear method, it doesn't mean people will use it, so 
the costs weakmaps induce on GC will have to be taken care of even if 
people don't manually clear the weakmap [forking the thread for this 
reason]. JS engine implementors will have to solve this problem 
regardless of the introduction of a .clear method or not. Since JS 
engines start having generational GC and WeakMaps, I feel here and now 
might be a very good place and time to revisit these 40 years. Each 
implementor will have to do this revisit anyway.
If anything, this thread may become a good resource for developers to 
understand why some of their programs using WeakMaps have conjecturally 
or inherently bad GC characteristics.

Of all points in this thread, the one that got stuck in my head is when 
Jason said: "In our current implementation, creating a new WeakMap and 
dropping the old one is very nearly equivalent in performance to clear()."
What this means is that something is lost when moving to a naive 
generational GC regarding WeakMaps. The loss is the knowledge of when 
exactly a weakmap is dead. And this loss has a cost related to weakmap 
GC cost. Although Mark showed a linear algorithm, one can still wonder 
if in practice this algorithm induce a significant cost (the worst-case 
complexity doesn't say much about the most-frequent-case cost of an 
algorithm).

What I'm trying to find out is whether there is a small-cost 
weakmap-specific tracking system that could tell the GC that a weakmap 
is dead as soon as possible. First and foremost, what did the research 
find in these 40 years on this specific question?
Did it prove that any tracking system doing what I describe would cost 
so much that it wouldn't save on what it's supposed to? If so, I'll be 
happy to read the paper(s) and give up on the topic. I assume it's not 
the case to continue.
Ideally, the tracking system would have the following properties:
* it costs nothing (or a small startup constant) if there is no weakmap
* the overall cost of the tracking system in normal cases is 
significantly less than what it costs to have a weakmap falsely assumed 
alive.
I say "in normal cases" because that's what modern GCs are already in 
the business of. Generational GC is an optimistic optimization based on 
the *observation* that in most real-life programs, most objects are 
short-lived. It's possible to craft a pathological program where the 
performance of the generational GC will be worse than the performance of 
a non-generational GC (keep all objects alive and then, the generational 
GC will spend most its time marking and moving objects from generation 
to generation)

I've suggested weakmap-only reference counting and maybe it fulfills the 
properties, maybe not. Other ideas welcome whether from 40 years of 
previous research or creative people reading this message.

David


More information about the es-discuss mailing list