Questions/issues regarding generators

Kevin Gadd kevin.gadd at gmail.com
Mon Mar 11 12:48:49 PDT 2013


Another option for scenarios like open() where it is not cheap to
create multiple distinct iterators (each starting from the beginning),
or for scenarios where it's impossible (like a network stream) would
be to only expose an iterator in those instances, not an iterable.
Exposing an iterator would make it clear that your only available
operations are those valid on an iterator: moving forward and getting
values. I guess in scenarios like a network stream you could also just
throw when a consumer asks for a second iterator; painful, but
probably not the end of the world if you document it.

I should note that the C# compiler's transform for functions that
produce iterables can also produce an iterator directly if you ask it
to; a feature like that would make this pattern easier but it'd
probably also complicate syntax and usage significantly. The iterables
that it produces close over the argument values and the iterators
contain the actual enumeration state, thus with the way the C#
compiler implements iterable functions, your example (zip over the
result of rangeAsGenerator twice) would work fine. I can understand if
this functionality is viewed as an undesirable complication, but it
does feel illogical to me for the iterable produced by a function* to
actually just be a single-use iterator under the covers. If that's
what it is, it should return an iterator.

-kg

On Mon, Mar 11, 2013 at 6:57 AM, Dmitry Lomov <dslomov at google.com> wrote:
> Coming back to the original example
>
>  let rangeAsArray = [1, 2, 3, 4]
>  let dup = zip(rangeAsArray, rangeAsArray)  // [[1,1], [2,2], [3,3], [4,4]]
>
> vs.
>
>  function* enum(from, to) { for (let i = from; i <= to; ++i) yield i }
>
>  let rangeAsGenerator = enum(1, 4)
>  let dup = zip(rangeAsGenerator, rangeAsGenerator)  // Oops!
>
> clearly, as Allen stated, under current proposal the second call is a user
> error. Under the current proposal, iterables are actually iterators and have
> internal state. However the consequence of this design is that the user code
> should never treat iterator() as a factory method, only as a coercion
> method, and generally assume that it can call iterator() on an object only
> once.
>
> This semantics is sound and consistent, but there is a problem: by that
> logic, the first call 'zip(rangeAsArray, rangeAsArray)' also has all the
> appearances of a user error! It requires careful analysis and thinking to
> convince oneself that it is indeed correct. Well, maybe not in a simple case
> when the initializer of rangeAsArray is an array literal, but as soon as the
> initializer is more complicated - say an external function, you can never be
> sure.
>
> If we assume this semantics, we generally cannot iterate the collections,
> such as arrays, more than once. Note that the failure that occurs when the
> user switches from array function to an generator is really hard to notice -
> the zip function does not break immediately or throw exception, it just
> produces non-sensical results.
>
> If we change semantics so that iterator() would return a fresh iterator
> every time it is called, then these problems will be avoided. But what
> happens with, say, 'open' example? The way to do it would be for 'open' to
> return an iterable that actually only opens a file when it's iterator()
> method is called. Therefore, for example, zip operation would work on files,
> too:
>
>    var f = open(filename, 'r')
>    var zippedFile = zip(f, f)
>
> Calls to f.iterator() inside zip would open the file twice and iterate the
> contents, whereas:
>   for(l in open(filename, 'r')) ...
>
> will continue to work. The read position will be inherent to the iterator
> (as returned by the iterator() method), not to the iterable that 'open'
> returns. That iterator can only be consumed once, but iterable can be reused
> time and again, by calling an iterator() method on it - just like an array.
> By adopting this approach, user code treating in-memory collections and
> other generated sequences unifies very nicely.
>
> As another example, consider the 'tee()' operator that Tab proposes. In the
> iterator-only world, it is unclear what that tee returns. Since it returns
> an iterator, and iterator can only be iterated once (since in iterator-only
> world the user have to generally assume that iterator() is a coercion
> method), the whole notion of 'caching' does not make sense. Now in
> iterable-and-iterator world, tee would take an iterator (which it would then
> iterate to the end) and produce an iterable, and that iterable would iterate
> cached values from an iterator over and over again.
>
> To summarize, while treating 'iterator()' as a coercion method is a
> consistent choice, it makes operations over collections  unnecessarily
> distinct from operations over generators. Implementing iterator() as a
> factory method will unify those operations while keeping supporting all
> other scenarios for iterators.
>
> Kind regards,
> Dmitry
>
> P.S. One nice way to unify iterables and iterators from the user perspective
> is the Andreas' proposal to make an iterator() function return a "clone" of
> the iterator that is started from the beginning.
>
> On Sat, Mar 9, 2013 at 12:31 AM, Tab Atkins Jr. <jackalmage at gmail.com>
> wrote:
>>
>> On Fri, Mar 8, 2013 at 9:23 AM, Jason Orendorff
>> <jason.orendorff at gmail.com> wrote:
>> > On Thu, Mar 7, 2013 at 1:05 PM, Andreas Rossberg <rossberg at google.com>
>> > wrote:
>> >> On 7 March 2013 18:30, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
>> >> > Zip's informal contract should state that if iterators are passed as
>> >> > arguments they need to be distinct objects. If you want to implement
>> >> > it
>> >> > defensively, you can add a  check for that pre-condition.
>> >>
>> >> I have to disagree here. That is just evading the question what the
>> >> contract for .iterator is. Either it is supposed to create new state
>> >> or it isn't. It's not a very useful contract to say that it can be
>> >> both, because then you cannot reliably program against it.
>> >
>> > In Python, the contract definitely says that it can be both. It's the
>> > only
>> > practical choice. For collections, you want new state. But you also want
>> > things such as generators, database cursors, and file descriptors to be
>> > iterable:
>> >
>> >     with open(filename, 'r') as f:
>> >         for line in f:
>> >             handle_input(line)
>> >
>> > and you definitely don't want new state here, because what would that
>> > even
>> > mean? A read position is kind of inherent to a file descriptor, right?
>> >
>> > When you call zip() in Python, you expect that each argument will be
>> > iterated. I mean, it could hardly be otherwise. So if you've got an
>> > argument
>> > that can only be consumed once (either something like a file, or an
>> > arbitrary iterable you don't know about), you don't pass it twice; and
>> > you
>> > expect each such argument to become useless afterwards, just as if you
>> > had
>> > used it in a for-loop. That's clear enough to code to reliably in
>> > practice.
>> > It's not all that different from Unix pipes.
>>
>> And in Python, the iterator algebra has .tee(), which uses caching to
>> produce multiple copies of a stateful iterator.
>>
>> ~TJ
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>



-- 
-kg


More information about the es-discuss mailing list