Weak Reference proposal

Joris van der Wel joris at jorisvanderwel.com
Wed Feb 17 13:30:10 UTC 2016


Resending because I the mailing list reject my previous email:




Here is an example of using a NodeIterator:


```
const jsdom = require("jsdom");
const document = jsdom.jsdom(`<a></a><b></b><c></c>`);

let it = document.createNodeIterator(document.body);
console.log(it.nextNode().nodeName); // BODY
console.log(it.nextNode().nodeName); // A
console.log(it.nextNode().nodeName); // B
console.log(it.nextNode().nodeName); // C
console.log(it.nextNode()); // null

it = document.createNodeIterator(document.body);
console.log(it.nextNode().nodeName); // BODY
document.body.removeChild(document.body.firstChild); // This remove
operation updates the internal state of the NodeIterator
console.log(it.nextNode().nodeName); // B
console.log(it.nextNode().nodeName); // C
console.log(it.nextNode()); // null
it = null;
```

In the case of NodeIterator, there are currently (read: in ES6) two
spec (DOM whatwg) compliant implementations possible:

1. Keep a history of all changes a Document has gone through, forever.
2. Keep a list of all NodeIterators which have been created for a
Document, forever.

jsdom uses solution #2. This not only leaks memory, but remove
operations become slower as more and more NodeIterator's are created.
(however as domenic described earlier we limit this list to 10 entries
by default).

The conflict between the DOM spec and ES6 is that we can not detect if
a NodeIterator is still in use by code outside of jsdom:

```
it = document.createNodeIterator(document.body);
console.log(it.nextNode().nodeName); // BODY
// ... wait an hour ...
console.log(it.nextNode().nodeName); // A
it = null; // and only now we can stop updating the NodeIterator state
```

(There used to be a it.detach() method for this purpose, but this has
been removed from the spec.)

Being able to keep a list of NodeIterator's weakly would be the only
solution if we want to avoid leaking resources.

Weak references might also be required for MutationObserver, although
I've not yet looked at this feature extensively, so I could be wrong.
Other features which you could implement using a weak reference (like
in the live collections) could be implemented using ES6 Proxy instead.

XMLHttpRequest, fetch, WebSocket, etc would even require a something
similar to a phantom reference (like in java) so that we can close the
connection when the object is no longer strongly or weakly referenced.

I would also really like to use weak references not just for jsdom,
there are some uses cases where they can simplify my code.

Gr. Joris

On Wed, Feb 17, 2016 at 9:41 AM, Jonas Sicking <jonas at sicking.cc> wrote:
> On Tue, Feb 16, 2016 at 11:02 PM, Domenic Denicola <d at domenic.me> wrote:
>>> For each NodeIterator object iterator whose root’s node document is node’s node document, run the NodeIterator pre-removing steps given node and iterator.
>>
>> Rephrased: every time you remove a Node from a document, you must go through all of the document's NodeIterators and run some cleanup steps (which have the effect of changing observable properties and behavior of the NodeIterator).
>
> Could you implement all of this using MutationObservers? I.e. have the
> NodeIterators observe the relevant nodes using MutationObservers?
>
> The only case that I can think of where the DOM could use weak
> references is for the getElementsByTagName(x) function. This function
> will either return a new NodeList object, or an existing one. The
> reason it sometimes returns an existing one is for performance
> reasons. We saw a lot of code doing:
>
> var i;
> for (i = 0; i < document.getElementsByTagName("div").length; i++) {
>   var elem = document.getElementsByTagName("div")[i];
>   doStuffWith(elem);
> }
>
> This generated a ton of NodeList objects, which are expensive to
> allocate. Hence browsers started caching these objects and returned an
> existing object "sometimes".
>
> The gecko implementation of "sometimes" uses a hash map keyed on
> tagname containing weak references to the returned NodeList. This is
> observable by for example doing:
>
> document.getElementsByTagName("div").foopy = "foopy";
> if (document.getElementsByTagName("div").foopy != "foopy") {
>   // GC ran between the getElementsByTagName calls.
> }
>
> However this exact behavior is not defined by spec. But I believe that
> all major browsers do do something similar for performance reasons.
> (This API is as old as it is crummy. And it is no surprise that it is
> poorly used).
>
> But it likely would be possible to write an implementation of
> "sometimes" which doesn't use weak references, at the cost of higher
> memory usage.
>
> / Jonas



-- 
github.com/Joris-van-der-Wel


More information about the es-discuss mailing list