Iterators, generators, finally, and scarce resources (Was: April 10 2014 Meeting Notes)

David Herman dherman at mozilla.com
Tue Apr 29 11:35:48 PDT 2014


On Apr 29, 2014, at 12:40 AM, Andy Wingo <wingo at igalia.com> wrote:

> I'm a bit grumpy that this is being brought up again, and
> this late, and in multiple forums, but as it seems that people want to
> talk about it again, that talking about it again is the thing to do...

Sorry about that. :( But the fact is Jafar joined TC39 very late and his technical points have merit, so we do have to grapple with them. (As for multiple forums, I wish I had a solution.)

> If I may summarize Jafar's argument, it's that the iterator in a for-of
> may hold a scarce resource, like a file descriptor, and because of that,
> for-of should be able to release this scarce resource on an early exit
> via "break".  The provisional consensus elaborates a method to do this.
> 
> Is this a fair summary?

I don't quite agree with that summary. These are IMO the most important points:

- Iterators might include "on cleanup" logic, although this is admittedly rare in synchronous code.

- Nevertheless, generators are the common implementation strategy for iterators, and try/finally is a part of the language, making "on cleanup" logic more likely to arise. While we can't ever guarantee that code won't run a first-class iterator object to its completion (just like you can't guarantee that a try block won't iloop), it's a bad smell if code that creates an iterator in a for-of loop head, loops over it, and *never even touches the iterator directly* doesn't shut down the iterator.

- Iterators are intended to be short-lived (in particular, long-lived iterators over mutable data sources are invalidation hazards). So the common consumer of iterators, for-of, should properly dispose of them.

- The uncommon case of using an iterator partially and in several loops can be easily implemented with combinators (or while loops), but we should be safe by default.

- The problems of "on cleanup" logic will be greatly exacerbated with future asynchronous iteration features, which are IMO an extremely high priority for the future. The overwhelming majority of JS programs that operate on sequences of data are doing so asynchronously. The moment we start going down the road of designing asynchronous iteration features (such as `for (await ... of ...)`), which in fact Jafar and I have been starting work on, the try/finally hazards will show up much more often. If we don't do proper disposal of synchronous iterators, we'll create an asymmetry between synchronous and asynchronous iteration, which would not only be a nasty wart but also a refactoring hazard.

> Indeed I expect that in
> practice most iterators in an ES6 program will be map, set, and array
> iterators, which in practice will not be implemented with generators.

I strongly disagree with this. Generators will by far be the most convenient and common way to implement iterators, regardless of their data source.

> Incidentally I think that if TC39 decided to re-add this method, it
> should be called close() instead, because it doesn't make sense to
> "return" from a non-generator iterator.

I thought so at first too, until I remembered that iterators have a return value. So I still think return is the right name.

> == Calling return() on iterators is rarely appropriate

You're arguing "rarely necessary," not "rarely appropriate," which is a weaker claim. But I dispute that too because of asynchronous iteration, as I explained above.

> However in this case it is possible to arrange to close the iterator,
> with a different interface:

This is a *dramatic* weakening of the power of iterators, in that you force *all* iteration abstractions to expose their external resources to consumers. Again, it may not seem like a big deal now but it'll be completely unacceptable for the asynchronous case.

> The other case is when you have an iterator consumer which is decoupled
> from the code that created the iterator, as in:
> 
>  function (iterable) {
>    ...
>    for (var x of iterable) {
>      if foo(x) break;
>    }
>    ...
>  }
> 
> But it is precisely in this case when you would *not* want to close the
> iterator, because you don't know its lifetime.

As I said above, we should not optimize the design for this case. It's easy enough to create combinators for it:

  function (iterable) {
    ...
    for (var x of keepalive(iterable)) {
      if foo(x) break;
    }
    ...
  }

But this will not be the common case. The majority of the time when you're working with sequences you use the data you need and then you're done with it.

> == return() in generators is semantically weird

It's not as weird as you make it out to be. Return is not much different from throw, first of all. Note also that ES7 do-expressions allow expressions to return.

> Also, the insistence on a return() that doesn't run catch blocks seems
> to me to be ill-placed.  I think it's telling that the counter-examples
> are from Python, which has a different semantic model, as it has
> finalization.  Implementing abstractions over scarce resources in JS is
> going to necessarily involve different design patterns than those used
> by Python.  For the given use-case, throw() is entirely sufficient.  If
> you don't trust your generators to do the right thing on an exception,
> you shouldn't be acquiring scarce resources!

It's just more painful with exceptions. It requires us to create some special kind of IteratorAbort exception, and it requires every catch block in a generator to add if-tests.

> Finally, the given use-case is incompletely specified; a loop can exit
> prematurely through exceptions as well as through "break".

Yes, also return in a for-of block exits the loop prematurely. Break is a representative example, but the spec would have to invoke .return() on all kinds of abrupt completions.

> == Calling return() on early exit from for-of is expensive
> 
> Wrapping a try/finally around each for-of is going to be really
> expensive in all engines right now.  I'm skeptical about our ability to
> optimize this one away.  Avoiding try/catch around for-of was one reason
> to move away from StopIteration, and it would be a pity to re-impose
> this cost on every for-of because of what is, in the end, an uncommon
> use case.

This glosses over a critical difference: the StopIteration semantics required catching exceptions on every iteration of the loop. This semantics only requires a check on loop exit.

> I think the expected result of doing this would be
> performance lore to recommend using other iteration syntaxen instead of
> for-of.

This argument seems fishy to me. There is no comparable syntax in JS for for-of and generators, so I think the alternative would be stuff like higher-order methods (a la .forEach). I find it hard to believe that a single check on the outside of the loop will make or break their ability to compete with higher-order methods.

> There's no perfect answer when it comes to abstractions over scarce
> resources.  Given the constraints of what JS is, its finalization model,
> its deployment in the browser, and its engines, for me the status quo is
> the best we can do.  I know that for people that open file descriptors,
> that's somewhat unsatisfying, but perhaps such a cross is what goes with
> the crown of being a true Unix hacker ;)

This isn't about Unix hackers. We have to be thinking ahead to where custom iteration will really shine in JS, and that is over asynchronous streams of data. Then *everyone* will be doing this.

Dave



More information about the es-discuss mailing list