Dataflow concurrency instead of generators (was: How would shallow generators compose with lambda?)

David-Sarah Hopwood david-sarah at jacaranda.org
Thu May 14 16:34:14 PDT 2009


Jason Orendorff wrote:
> On Thu, May 14, 2009 at 12:25 PM, Mark S. Miller <erights at google.com> wrote:
>> Given both shallow generators and lambda, I don't understand how they
>> would interact.
> 
> This is a good question.

So, why do we need generators anyway?

I know generators are quite popular in languages that support them.
However, there are other language features that can be used to
provide similar (or greater) functionality, and that would not
introduce the same problematic control flow interactions.

For instance, suppose that you have dataflow concurrency, as supported
by Oz and by recent dataflow extensions for Ruby, Python, and Scala:

<http://www.mozart-oz.org/documentation/tutorial/node8.html>
<http://github.com/larrytheliquid/dataflow/tree/master>
<http://pypi.python.org/pypi/dataflow/0.1.0>
<http://github.com/jboner/scala-dataflow/tree/master>

Then the functionality of a generator can be implemented using a
process/thread that extends a list or queue constructed from
dataflow variables.

This approach avoids any problems due to a generator being able
to interfere with the control flow of its callers. It also allows
the producer process to run truly in parallel with the consumer(s),
possibly taking advantage of multiple CPU cores. The programming
model is no more complicated for the cases that correspond to
correct use of generators, because strict use of dataflow variables
for communication between processes (with no other mutable data
structures shared between processes) is declarative, i.e. it will
give the same results as a sequential generator-based implementation
would have done.

Although it gives the same computational results, the dataflow-
concurrent approach allows more flexibility in the flow control
between producer and consumer: for example, the producer process
can be allowed to run freely ahead of the consumer process, or
constrained to generate only a bounded number of unconsumed
elements. The special case where the producer process only
generates the next unconsumed element and only starts to generate
it when needed -- effectively sequentializing the producer and
consumer -- corresponds to a sequential generator or coroutine.
(This requires by-need dataflow variables, as supported by Oz and
at least the Ruby library mentioned above.)
However, I suspect that the bounded queue is likely to be more
efficient and more often what is really wanted.

This flow-control flexibility can be exercised by passing different
kinds of dataflow list/queue implementation into the producer process,
without changing the latter's code. It is possible to construct
more general dataflow networks that can split or merge streams,
if needed. Dataflow concurrency can also be extended to more
expressive concurrency models that introduce nondeterminism,
and the dataflow features only gain in usefulness in that case.

If asked to pick two language features from
{TC-respecting lambdas, generators, dataflow concurrency},
I would always pick lambdas and dataflow concurrency, and drop
generators as a primitive feature. It is still possible to mimic
a generator-like API, for programmers who are used to that, in
terms of the dataflow concurrency features. The semantics will
be slightly different because a generator will not be able to
directly access shared mutable data structures (it could still
access structures containing immutable and dataflow variables),
but this limitation is IMHO more than outweighed by the greater
generality and potential parallelism.

-- 
David-Sarah Hopwood ⚥



More information about the es5-discuss mailing list