[rust-dev] In favor of types of unknown size

Matthieu Monrocq matthieu.monrocq at gmail.com
Sun Apr 29 04:53:45 PDT 2012


Hi Sebastian,

I have a few comments.

On Sun, Apr 29, 2012 at 12:21 AM, Sebastian Sylvan <
sebastian.sylvan at gmail.com> wrote:

> On Fri, Apr 27, 2012 at 3:15 PM, Niko Matsakis <niko at alum.mit.edu> wrote:
> > The types `[]T` and `str` would represent vectors and strings,
> > respectively.  These types have the C representation `rust_vec<T>` and
> > `rust_vec<char>`.  They are of *dynamic size*, meaning that their size
> > depends on their length.  The literal form for vectors and strings are
> > `[a, b, c]` and `"foo"`, just as normal.
>
> Back when I was entertaining the idea of writing my own rust-like
> language (before I was aware of rust's existence), I had the idea that
> all records/objects cold have dynamic size if any of their members had
> dynamic size (and the root cause of dynamic size would be fixed-size
> arrays - fixed at the time of construction, not a static constant
> size).
>
> This is only slightly related, but it's too close that I can't resist
> presenting gist of it (it's not completely worked out), in case anyone
> else wants to figure it out and see if it makes sense :-)
>
> Basically the idea spawned from the attempt of trying to avoid
> pointers as much as possible. Keep things packed, with "chunky"
> objects, reduce the complexity for GC/RC, reduce memory fragmentation,
> etc.. Aside from actual honest-to-goodness graphs (which fairly rare,
> and most are small, and unavoidable anyway). The conjecture is that
> the main source of pointers are arrays.
>
> Okay, so basically the idea is that arrays are length-prefixed blocks
> of elements. They're statically sized (can't be expanded), but you can
> pass in a dynamic, non-constant value when you construct them. Unlike
> C/C++ though these arrays can still live *inside* an object. There's
> some fiddlyness here.. e.g.. do you put all arrays (except ones which
> true const sizes?) at the end of the object so other members have a
> constant offset? If you have more than a small number of arrays in an
> object it probably makes to have a few pointers indicating the start
> of each instead of having to add up the sizes of preceeding arrays
> each time an access is made to one of the "later" arrays.
>
> Small reactions on "pointers": I think it's a good idea to pack the
variable length structures at the end of the current object. However I
would use cumulative offsets rather than pointers, because of size (on
64-bits architecture, which are becoming the de-facto standard for PCs and
servers).

The idea would be in C-style:

struct Object {
    int scalar1;
    int scalar2;
    unsigned __offset0;
    unsigned __offset1;
    unsigned __offset2;
    SomeObject __obj0;
    Table __obj1[X];
};

Where __offset0 indicates the offset from the start of Object to the start
of __obj0, __offset1 the offset from the start of Object to the start of
__obj1 and __offset2 the offset from the start of Object to the start of
__obj2. This means you have direct access to any attribute with a simple
addition to pointer, and you can know the size with a simple substraction
(the size of __obj0 is __offset1 - __offset2).



> So, during Construction of an object, you'd have to proceed in two
> phases. First is the constructor logic where you compute values, and
> the second is the allocation of the object and moving the values into
> it. You need to hold off on allocation because you don't know the size
> of any member objects until you've constructed them. Moving an array
> is now expensive, since it requires a copy, not just a pointer move.
> So ideally the compiler would try to move the allocation to happen as
> early as possible so most of the values can be written directly to its
> final location instead of having to be constructed on the stack (or
> heap) and then moved. There are of course cases where this couldn't be
> done. E.g. if the size of an array X, depends on some computation done
> on array Y in the same object - you have to create Y on the stack, or
> heap, to run the computation before you can know the total size of the
> object, and only then can you allocate the final object and copy the
> arrays into it.
>
> Yes, this is getting quite difficult at this stage. It's good once the
size is settled but the construction can be expensive.


> I'm not 100% sold on the idea, since it does make things a bit more
> complex, but it is pretty appealing to me that you can allocate
> dynamic-but-fixed sized arrays on the stack, inside other objects
> etc.. For a language that emphasizes immutable data structures I'd
> imagine the opportunity to use these fixed arrays "in-place" would be
> extremely frequent.
>
Seb
>
> --
> Sebastian Sylvan
>


There is a subtle issue that I had not remarked earlier. This mechanism
works great for fixed-size arrays, but is not amenable to extensible
arrays: vectors and strings *grow*.

So it would work if the field/attribute is runtime-fixed-size, either
because the type imposes it or because it's declared immutable, however it
will not work in the general case.

This is important because it means that in general, we need the vector or
string to be allocated on the heap because we want it growable. Having
realized that, I wonder if it's worth considering types of unknown size to
start with. The idea of a pointer to a `vec<T>` that is of
runtime-fixed-size is pleasant enough, but it means that the vector itself
is not modified, instead a new vector is built and the pointer is reseated.
This in turns means that I need to pass types such as  `&@vec<T>` to my
functions...

Frankly, this is not nice to the user, and since we are talking about very
common types here, I believe it would be worth sugar coating a bit,
syntax-wise. Having specific types for those runtime-fixed-length
structures (a fixed_array<T> ?) could be worth it, but I strongly believe
that `vec<T>` and `str` should be manipulable as-is rather than always
prefixed by `~` or `@` to be useful, even more because most of the routine
would then have to be duplicated to handle both types of ownership:
*shiver*.

-- Matthieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20120429/26c5b441/attachment.html>


More information about the Rust-dev mailing list