[rust-dev] Idea: Memcpyed stacks for green tasks

Vadim vadimcn at gmail.com
Sat Jan 25 12:26:19 PST 2014


Hi Bill,
If I understand this right, what you are proposing is the approach of
Python's greenlets library, which I already tried to pitch
here<http://mail.mozilla.org/pipermail/rust-dev/2013-November/006544.html>.


I think that the biggest problem with this approach, in its' raw form, is
that greenlets are bound to a specific OS thread, which would cause all
sorts of headache.  For example:
- It would be difficult to balance greenlet tasks between several OS
threads,
- If async I/O requests for two greenlets, bound to the same OS thread,
were to complete simultaneously, you would not be able to resume them in
parallel,
- If a thread were to become blocked for a significant amount of time, all
the greenlets bound to it become un-resumable,
and so on...

Greelets would be much more palatable if their stacks were
position-independent.  This might be possible to achieve with a suitable
LLVM transform<http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-January/069662.html>,
though I am not yet sure how portable this would be.


cheers,
Vadim



On Sat, Jan 25, 2014 at 8:58 AM, Bill Myers <bill_myers at outlook.com> wrote:

> Stack management for green tasks has been based in the past first on
> segmented stacks and then on standard large stacks.
>
> However, I just realized that there is a third alternative which might
> well be better than both of those.
>
> The idea is very simple: a green task would run on a large stack like now,
> but when it is descheduled, a memory buffer of the size of its used stack
> space is allocated, and its stack data is copied there; when it is
> rescheduled, the stack data is copied back to the original address.
>
> The copying can be performed either by memcpy (perhaps with a specialized
> memcpy that assumes alignment and always uses SIMD instruction to copy with
> no checks), or perhaps by using a compression algorithm designed for
> maximum speed, such as Snappy or maybe a new ad-hoc design.
>
> This allows to have the best possible memory efficiency and thus the
> maximum possible number of tasks, even better than segmented stacks due to
> precise sizing and optional compression, while not adding any overhead to
> code that does not block, either in terms of code generation complexity or
> runtime cost.
>
> There is of course an additional runtime cost when blocking, but blocking
> already often involves both system calls and context switching, and
> workloads with lots of blocking green tasks are probably I/O bound anyway;
> furthermore, stack data is probably in the CPU cache, so there shouldn't be
> much penalty.
>
> The only significant drawback is that two green tasks running from the
> same "large" stack (note that stacks cannot be easily switched, as that
> would invalidate borrowed pointers on the stack to the stack) cannot
> ever run concurrently; however, by making the number of large stacks a
> large enough multiple of the number of CPU cores, the probability of
> reduced parallelism due to this issue can be made as small as desired.
>
> In general, one would use an adaptive approach, where this kind of copying
> would start to kick in only once the number of green tasks becomes large;
> when not copying, one would just optionally use madvise(MADV_DONTNEED) to
> trim unused whole pages of the stack.
>
> Also, if the data to be copied is large (several OS pages), one may use
> mremap() or similar facilities to move the address space mappings instead
> of the data itself.
>
> Some unsafe code that assumes that tasks can access each other's stacks
> will break (probably only the new Mutex implementation), but that can be
> fixed by putting the data in the task structure instead of the stack.
>
> Overall, this should allow writing something like a parallel network
> server or web crawler in Rust in a natural blocking style, while being
> fully confident in its scalability, which is something that is not really
> possible at the moment.
>
> Once this is in place, the best practices for Rust programs would become:
> 1. Use native tasks for tasks whose number is bounded at compile time and
> small
> 2. Use green tasks for tasks whose number is unbounded or large
>
> One could even automate this to some extent by spawning by default a
> native task the first C times a given proc() code address is used to start
> a task (where C = number of CPU cores), and green tasks afterwards.
>
> What do you think?
> _______________________________________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20140125/19f676f4/attachment.html>


More information about the Rust-dev mailing list