[rust-dev] In favor of types of unknown size

Sebastian Sylvan sebastian.sylvan at gmail.com
Sat Apr 28 15:21:05 PDT 2012


On Fri, Apr 27, 2012 at 3:15 PM, Niko Matsakis <niko at alum.mit.edu> wrote:
> The types `[]T` and `str` would represent vectors and strings,
> respectively.  These types have the C representation `rust_vec<T>` and
> `rust_vec<char>`.  They are of *dynamic size*, meaning that their size
> depends on their length.  The literal form for vectors and strings are
> `[a, b, c]` and `"foo"`, just as normal.

Back when I was entertaining the idea of writing my own rust-like
language (before I was aware of rust's existence), I had the idea that
all records/objects cold have dynamic size if any of their members had
dynamic size (and the root cause of dynamic size would be fixed-size
arrays - fixed at the time of construction, not a static constant
size).

This is only slightly related, but it's too close that I can't resist
presenting gist of it (it's not completely worked out), in case anyone
else wants to figure it out and see if it makes sense :-)

Basically the idea spawned from the attempt of trying to avoid
pointers as much as possible. Keep things packed, with "chunky"
objects, reduce the complexity for GC/RC, reduce memory fragmentation,
etc.. Aside from actual honest-to-goodness graphs (which fairly rare,
and most are small, and unavoidable anyway). The conjecture is that
the main source of pointers are arrays.

Okay, so basically the idea is that arrays are length-prefixed blocks
of elements. They're statically sized (can't be expanded), but you can
pass in a dynamic, non-constant value when you construct them. Unlike
C/C++ though these arrays can still live *inside* an object. There's
some fiddlyness here.. e.g.. do you put all arrays (except ones which
true const sizes?) at the end of the object so other members have a
constant offset? If you have more than a small number of arrays in an
object it probably makes to have a few pointers indicating the start
of each instead of having to add up the sizes of preceeding arrays
each time an access is made to one of the "later" arrays.

So, during Construction of an object, you'd have to proceed in two
phases. First is the constructor logic where you compute values, and
the second is the allocation of the object and moving the values into
it. You need to hold off on allocation because you don't know the size
of any member objects until you've constructed them. Moving an array
is now expensive, since it requires a copy, not just a pointer move.
So ideally the compiler would try to move the allocation to happen as
early as possible so most of the values can be written directly to its
final location instead of having to be constructed on the stack (or
heap) and then moved. There are of course cases where this couldn't be
done. E.g. if the size of an array X, depends on some computation done
on array Y in the same object - you have to create Y on the stack, or
heap, to run the computation before you can know the total size of the
object, and only then can you allocate the final object and copy the
arrays into it.

I'm not 100% sold on the idea, since it does make things a bit more
complex, but it is pretty appealing to me that you can allocate
dynamic-but-fixed sized arrays on the stack, inside other objects
etc.. For a language that emphasizes immutable data structures I'd
imagine the opportunity to use these fixed arrays "in-place" would be
extremely frequent.

Seb

-- 
Sebastian Sylvan


More information about the Rust-dev mailing list