[rust-dev] Faster communication between tasks

Simon Ruggier simon80 at gmail.com
Tue Jan 28 18:05:20 PST 2014

A small update: I've gotten a resizable version of my disruptor
implementation working, and the performance looks pretty good so far. I
still have a few loose ends to tie up before I push out the changes. I
should have the updated code on GitHub hopefully within a couple of weeks,
depending on how much time I find to work on it.

On Sat, Nov 9, 2013 at 2:13 PM, Simon Ruggier <simon80 at gmail.com> wrote:

> Hi all, I've tentatively come up with a design that would allow the sender
> to reallocate the buffer as necessary, with very little added performance
> cost. The sending side would bear the cost of reallocation, and there would
> be an extra test that receivers would have to make every time they process
> an item (no extra atomic operations needed). However, it may be a few weeks
> or more before I have a working implementation to demonstrate, so I figured
> it might be worthwhile to mention now that I'll be working on this.
> Also, I think it would be interesting to investigate doing something like
> the Linux kernel's deadlock detection[1], but generalized to apply to
> bounded queues, and implemented as a static check. I know little about
> this, but even so, I can see how it would be an enormous amount of work. On
> the other hand, I would have thought the same thing about the memory safety
> rules that Rust enforces. I'm hopeful that this will eventually be possible
> as well.
> [1] https://www.kernel.org/doc/Documentation/lockdep-design.txt
> On Wed, Oct 30, 2013 at 12:55 AM, Simon Ruggier <simon80 at gmail.com> wrote:
>> On Tue, Oct 29, 2013 at 3:30 PM, Brian Anderson <banderson at mozilla.com>wrote:
>>>  On 10/28/2013 10:02 PM, Simon Ruggier wrote:
>>> Greetings fellow Rustians!
>>> First of all, thanks for working on such a great language. I really like
>>> the clean syntax, increased safety, separation of data from function
>>> definitions, and freedom from having to declare duplicate method prototypes
>>> in header files.
>>> I've been working on an alternate way to communicate between tasks in
>>> Rust, following the same approach as the LMAX Disruptor.[1] I'm hoping to
>>> eventually offer a superset of the functionality in the pipes API, and
>>> replace them as the default communication mechanism between tasks. Just as
>>> with concurrency in general, my main motivation in implementing this is to
>>> improve performance. For more information about the disruptor approach,
>>> there's a lot of information linked from their home page, in a variety of
>>> formats.
>>> This is really exciting work. Thanks for pursuing it. I've been
>>> interested in exploring something like Disruptor in Rust. The current
>>> channel types in Rust are indeed slow, and fixing them is the topic of
>>> https://github.com/mozilla/rust/issues/8568.
>> I'll start paying attention to that. The Morrison & Afek 2013 paper looks
>> like something I should read.
>>> This is my first major contribution of new functionality to an
>>> open-source project, so I didn't want to discuss it in advance until I had
>>> a working system to demonstrate. I currently have a very basic proof of
>>> concept that achieves almost two orders of magnitude better performance
>>> than the pipes API. On my hardware[2], I currently see throughput of about
>>> 27 million items per second when synchronizing with a double-checked wait
>>> condition protocol between sender and receivers, 80+ million items with no
>>> blocking (i.e. busy waiting), and anywhere from 240,000 to 600,000 when
>>> using pipes. The LMAX Disruptor library gets up to 110 million items per
>>> second on the same hardware (using busy waiting and yielding), so there's
>>> definitely still room for significant improvement.
>>> Those are awesome results!
>> Thanks! When I first brought it up, it was getting about 14 million with
>> the busy waiting. Minimizing the number of atomic operations (even with
>> relaxed memory ordering) makes a big difference in performance. The 2/3
>> drop in performance with the blocking wait strategy comes from merely doing
>> a read-modify-write operation on every send (it currently uses atomic swap,
>> I haven't experimented with others yet). To be fair, the only result I can
>> take credit for is the blocking algorithm. The other ideas are straight
>> from the original disruptor.
>>> I've put the code up on GitHub (I'm using rustc from master).[3]
>>> Currently, single and multi-stage pipelines of receivers are supported,
>>> while many features are missing, like multiple concurrent senders, multiple
>>> concurrent receivers, or mutation of the items as they pass through the
>>> pipeline. However, given what I have so far, now is probably the right time
>>> to start soliciting feedback and advice. I'm looking for review,
>>> suggestions/constructive criticism, and guidance about contributing this to
>>> the Rust codebase.
>>> I'm not deeply familiar with Disruptor, but I believe that it uses
>>> bounded queues. My general feeling thus far is that, as the general 'go-to'
>>> channel type, people should not be using bounded queues that block the
>>> sender when full because of the potential for unexpected deadlocks. I could
>>> be convinced otherwise though if it's just not possible to have reasonably
>>> fast unbounded channels. Note that I don't think it's critical for the
>>> general-purpose channel to be as fast as possible - it's more important to
>>> be convenient.
>> Yes, it does. I'm divided on this, because unbounded queues can also lead
>> to memory exhaustion and added latency, but I suspect that for many use
>> cases, you're right. For performance critical code, I think there's
>> probably no argument: if a queue is too large, it starts causing latency
>> problems (like with bufferbloat). A queue that accepts an unlimited number
>> of items is like an API that doesn't let the caller know about errors. The
>> caller needs to know that there's a large queue, and adjust its behaviour.
>> Because of this, I doubt any performance-critical application would find it
>> truly optimal to use unbounded queues. My opinion on this is strongly
>> influenced by this post:
>> http://mechanical-sympathy.blogspot.co.uk/2012/05/apply-back-pressure-when-overloaded.html
>> For general usage, though, I need to do more research. Any application
>> where latency is relevant really should be designed to deal with
>> back-pressure from queues, but there may be some batch job style use cases
>> where, as you say, it isn't worth the extra effort. On the other hand, it's
>> relevant to think about how deadlocks occur, and decide whether or not it's
>> reasonable for developers to expect to be able to do those things. I'll
>> look into this and see what I come up with.
>> If there were some general way to mitigate the deadlock issue within the
>> runtime, it would also solve this problem.
>> As a last resort, I suspect that I could probably figure out a way to
>> have the sender resize the buffer when it fills, copy the elements over,
>> and then switch the consumers over to the larger buffer. I don't know if I
>> could do it without affecting the fast path on the receiver side.
>> Please keep working on this. I'm excited to see your results.
>> I appreciate the encouragement :)
>>> Thanks,
>>> Simon
>>> [1] http://lmax-exchange.github.io/disruptor/
>>> [2] A 2.66GHz Intel P8800 CPU running in a Thinkpad T500 on Linux x86_64
>>> [3] https://github.com/sruggier/rust-disruptor
>>> _______________________________________________
>>> Rust-dev mailing listRust-dev at mozilla.orghttps://mail.mozilla.org/listinfo/rust-dev
>>> _______________________________________________
>>> Rust-dev mailing list
>>> Rust-dev at mozilla.org
>>> https://mail.mozilla.org/listinfo/rust-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20140128/3f96633f/attachment-0001.html>

More information about the Rust-dev mailing list