Questions/issues regarding generators

Dmitry Lomov dslomov at google.com
Mon Mar 11 06:57:44 PDT 2013


Coming back to the original example

 let rangeAsArray = [1, 2, 3, 4]
 let dup = zip(rangeAsArray, rangeAsArray)  // [[1,1], [2,2], [3,3], [4,4]]

vs.

 function* enum(from, to) { for (let i = from; i <= to; ++i) yield i }

 let rangeAsGenerator = enum(1, 4)
 let dup = zip(rangeAsGenerator, rangeAsGenerator)  // Oops!

clearly, as Allen stated, under current proposal the second call is a user
error. Under the current proposal, iterables are actually iterators and
have internal state. However the consequence of this design is that the
user code should never treat iterator() as a factory method, only as a
coercion method, and generally assume that it can call iterator() on an
object only once.

This semantics is sound and consistent, but there is a problem: by that
logic, the first call 'zip(rangeAsArray, rangeAsArray)' also has all
the appearances of a user error! It requires careful analysis and thinking
to convince oneself that it is indeed correct. Well, maybe not in a simple
case when the initializer of rangeAsArray is an array literal, but as soon
as the initializer is more complicated - say an external function, you can
never be sure.

If we assume this semantics, we generally cannot iterate the collections,
such as arrays, more than once. Note that the failure that occurs when the
user switches from array function to an generator is really hard to notice
- the zip function does not break immediately or throw exception, it just
produces non-sensical results.

If we change semantics so that iterator() would return a fresh iterator
every time it is called, then these problems will be avoided. But what
happens with, say, 'open' example? The way to do it would be for 'open' to
return an iterable that actually only opens a file when it's iterator()
method is called. Therefore, for example, zip operation would work on
files, too:

   var f = open(filename, 'r')
   var zippedFile = zip(f, f)

Calls to f.iterator() inside zip would open the file twice and iterate the
contents, whereas:
  for(l in open(filename, 'r')) ...

will continue to work. The read position will be inherent to the iterator
(as returned by the iterator() method), not to the iterable that 'open'
returns. That iterator can only be consumed once, but iterable can be
reused time and again, by calling an iterator() method on it - just like an
array.
By adopting this approach, user code treating in-memory collections and
other generated sequences unifies very nicely.

As another example, consider the 'tee()' operator that Tab proposes. In the
iterator-only world, it is unclear what that tee returns. Since it returns
an iterator, and iterator can only be iterated once (since in iterator-only
world the user have to generally assume that iterator() is a coercion
method), the whole notion of 'caching' does not make sense. Now in
iterable-and-iterator world, tee would take an iterator (which it would
then iterate to the end) and produce an iterable, and that iterable would
iterate cached values from an iterator over and over again.

To summarize, while treating 'iterator()' as a coercion method is a
consistent choice, it makes operations over collections  unnecessarily
distinct from operations over generators. Implementing iterator() as a
factory method will unify those operations while keeping supporting all
other scenarios for iterators.

Kind regards,
Dmitry

P.S. One nice way to unify iterables and iterators from the user
perspective is the Andreas' proposal to make an iterator() function return
a "clone" of the iterator that is started from the beginning.

On Sat, Mar 9, 2013 at 12:31 AM, Tab Atkins Jr. <jackalmage at gmail.com>wrote:

> On Fri, Mar 8, 2013 at 9:23 AM, Jason Orendorff
> <jason.orendorff at gmail.com> wrote:
> > On Thu, Mar 7, 2013 at 1:05 PM, Andreas Rossberg <rossberg at google.com>
> > wrote:
> >> On 7 March 2013 18:30, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
> >> > Zip's informal contract should state that if iterators are passed as
> >> > arguments they need to be distinct objects. If you want to implement
> it
> >> > defensively, you can add a  check for that pre-condition.
> >>
> >> I have to disagree here. That is just evading the question what the
> >> contract for .iterator is. Either it is supposed to create new state
> >> or it isn't. It's not a very useful contract to say that it can be
> >> both, because then you cannot reliably program against it.
> >
> > In Python, the contract definitely says that it can be both. It's the
> only
> > practical choice. For collections, you want new state. But you also want
> > things such as generators, database cursors, and file descriptors to be
> > iterable:
> >
> >     with open(filename, 'r') as f:
> >         for line in f:
> >             handle_input(line)
> >
> > and you definitely don't want new state here, because what would that
> even
> > mean? A read position is kind of inherent to a file descriptor, right?
> >
> > When you call zip() in Python, you expect that each argument will be
> > iterated. I mean, it could hardly be otherwise. So if you've got an
> argument
> > that can only be consumed once (either something like a file, or an
> > arbitrary iterable you don't know about), you don't pass it twice; and
> you
> > expect each such argument to become useless afterwards, just as if you
> had
> > used it in a for-loop. That's clear enough to code to reliably in
> practice.
> > It's not all that different from Unix pipes.
>
> And in Python, the iterator algebra has .tee(), which uses caching to
> produce multiple copies of a stateful iterator.
>
> ~TJ
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130311/18afdca0/attachment.html>


More information about the es-discuss mailing list