[rust-dev] a vision for vectors

Niko Matsakis niko at alum.mit.edu
Wed Jun 13 16:15:33 PDT 2012


Hello,

I wanted to check something. We are working on the Great Change to a 
more flexible vector system and I want to outline the design that's in 
my head. This has some implications for How Efficient Rust Code Is 
Written, so I wanted to make sure we were all on the same page.

*Implications for writing efficient Rust*

I figured I'd just start with the implications for writing Rust. 
Currently, to build up a vector, we rely upon an idiom like:

let mut v = [];
for some loop { v += [elt]; }

Now, often, such loops can (and should) be written using a higher-order 
function (e.g., map, filter). But sometimes not. In such cases, under 
the new regime, the recommended idiom would be:

let dv = dvec();
for some loop { v.push(elt); }
let v = dvec::unwrap(dv); // if necessary, convert to a vector

Actually the name `dvec()` (dynamic vector) will probably change—perhaps 
to vecbuf? mvec? suggestions welcome—but you get the idea. The same 
would eventually apply to building up strings.

Basically, the idea is that we have "builder" types for vectors and 
strings. These builder types will overallocate and use dirty tricks to 
achieve reasonable performance. Using convenience operators like `+` 
will not do such things.

*Details*

The actual implementation strategy is that the representation of vectors 
will stay mostly the same as it is now. However, when the compiler 
allocates vectors, it will always do so for precisely the size they need 
to be (fill == alloc, in our vector rep). There will be internal 
functions (vec::alloc_empty_with_capacity() or something) that allocate 
an empty vector but with a large capacity and unsafe functions that can 
be used to set the length. These can be used by dvec-like classes but 
also by routines like `vec::map()`. Most of this exists today. The only 
real thing that changes is that we take *away* the tricks the compiler 
does for `+=`.

*Motivation*

Part of the motivation for this change is that when you have task-local 
vectors, the tricks we play now where we treat vectors both as values 
and as things that can be updated in place don't work so well (this is 
precisely why the move was made to unique vectors in the first place, as 
I understand it). However, task-local vectors are good for a number of 
reasons (cheaper copies; easier to ensure memory safety), so I expect 
we'll wind up using them a fair amount: to obtain reliably good 
performance, then, a builder like `dvec` can be used that encapsulates 
the task-local vector pointer until construction is complete, making it 
safe to append to it in place.

Another motivation is that it is part of a general trend to push 
intelligence out of the compiler and into libraries where possible. We 
can build vector append using overloaded operators. This also ensures 
that end-users will be able to design efficient libraries and so forth. 
Moving things like `vector +` into libraries also simplifies the type 
checker, as we can draw on impls to handle all the various cases (@ vs ~ 
vectors, imm vs mut vectors, and so forth).


Niko


More information about the Rust-dev mailing list