# [rust-dev] Arrays, vectors...

Graydon Hoare graydon at mozilla.com
Fri Mar 23 13:42:36 PDT 2012

I've been thinking and sketching a lot about vectors & arrays lately.
There are a _lot_ of design pressures on this part of the language, but
our current approach is clearly unsatisfactory:

- Existing vecs are always unique. Sometimes you want shared, but
boxing them as @[] causes double-indirection, feels awkward.

- Existing vecs are always in the heap. So:
- There's no way to alloca one (in stack)
- There's no way to allocate one in constant memory
- There's no way to allocate one inside another structure

- The latter is particularly egregious when it comes to C interop.
C has the idea of a dense block of fixed-size, int[10] is
10-ints-long. We have no way to model this in rust right now.

I think we need to deal with this stew of disappointments in our vectors
soon, probably in the 0.2 - 0.3 timeframe. I think with the appearance
of region pointers we have a reasonable leg to stand on in terms of
formulating lifecycles. Here's my proposal:

- [int] means (&int,len), it's a slice
- str means ((&u8,len) : is_utf8(*)), a textual slice

- the region system is used to relate the visibility of [t] and str
to their region-of-allocation, the same way it relates normal &t
values. The additional 'length' integer on a slice is used only
by unsafe code such as that implementing operator[] and iteration.

- [int/10] and str/10 mean "fixed-size, 10-elements long"
and are _interior values_. The number is part of the type.
The type's size is N * sizeof(T). We define const-expressions and
integer-const-expressions as part of this work, so that number
can in some cases be symbolic.

- [int/~] is a heap-resident unique vector, like [int] today
- [int/@] is a heap-resident shared vector, like old [int]
- str/~ and str/@ are the same for strings
- The compiler open-codes all of this

- [int] <: [int/10]
[int] <: [int/~]
[int] <: [int/@]
str <: str/10
str <: str/~
str <: str/@

- In other words, you can cast anything down to the "slice" form,
the same way you can cast any of the closures to fn() type

- Literals are subtle; assume p,q,r are dynamic vals:

- [1,2,3,4,5]    -- constant memory, type [int]
- [1,2,3,4,5]/5  -- constant memory, type [int/5]
- [1,2,3,4,5]/_  -- same, type [int/5], type-size inferred
- "hello"        -- constant memory, type str
- "hello"/5      -- constant memory, type str/5
- "hello"/_      -- same, type str/5, type-size inferred
- [p,q,r]        -- alloca'ed memory, type [int]
- [p,q,r]/3      -- alloca'ed memory, type [int/3]
- [p,q,r]/_      -- same, type [int/3], type-size inferred
- [p,q,r]/@      -- heap-resident shared, type [int/@]
- [p,q,r]/~      -- heap-resident unique, type [int/~]

- The compiler has to know about all these types. It's a bit of a
bestiary but it means all the use-cases are satisfied and I think
it composes well and is congruent with the rest of the language.

- Things like foo[i], +, += and such are pushed down to library code,
literals, allocation, casting, passing and destruction.

Positives of this approach:

- The common representation is cheapest, is a subtype of all, and is
the default type of the easiest-to-write literal forms. Most people
will write 'let x = "hello";' and get x:str inferred. And it'll be
cheap to pass, and constant-memory. The next-most-common (vectors of
dynamic values) are alloca'ed, but also passed as slices, cheap.

- It's possible to denote the aggregate, not just a pointer-to-it,
iff you fix its size. If you can't commit to a size, you can't
denote the aggregate, can only get a pointer-to-it.

- In particular, you can now translate C array types directly to a
rust type. { x: int, y: [int/15], z: int } is a dense structure.

- Pointers-to-fixed-sized-types are still possible: &[int/1024] is
a pointer. As is *[int/1024]. Orthogonality is maintained.

- Prefix box operators ~ and @ still mean what they always mean; there
is no weirdness that arises by trying to make ~[] special and then
confront the inconsistency of ~T != ~[] when T == []. Like with
fn closures, we make the "how you manage your associated storage" a
syntactically distinct component of the tycon.

- Heap-resident versions exist; like in any data structure, you have
to discuss uniquieness or shared-ness to some extent. If you want a
record with 3 strings you can send over a channel, it has to be of
type {x:str/~, y:str/~, z:str/~}.

So, thoughts, likes, dislikes?

Are the semantics over-complex? I think we really have a large number of
use-cases and there's no way we can hit them all with a single
abstraction. Shared-heap != unique-heap != constant != alloca !=
borrowed region, and fixed-size != variable-size.

Is the trailing slash absolutely hideous? There are a few other unused
ASCII symbols in the type grammar but this was the nicest-looking I
could see (that didn't collide with something else). It's also possible
to write them as str/10, str@ and str~, say, and similarly [int/10],
[int]~ and [int]@. That's fewer slashes but a bit more visual ambiguity
if you have a leading ~ or @ as well.

-Graydon