[rust-dev] Syntax of vectors, slices, etc

Niko Matsakis niko at alum.mit.edu
Thu Apr 19 07:22:47 PDT 2012


In general I love Graydon's proposal for strings and arrays, but I am 
not crazy about the notation.  In particular I think []/@ and []/~ is 
not a good syntax for shared/unique vectors.  It's not the slash, it's 
that I find it inconsistent.  Generally speaking, a @ or ~ after the 
main type is a bound, and before it indicates the kind of the pointer.  
But here, it indicates the kind of pointer.  And []/3 is not a pointer 
at all.

In Graydon's proposal, there are three kinds of vector-like things:

- Fixed-length arrays ([T]/3, T[3] at runtime)
- Vectors ([T]/@, [T]/~, boxed<rust_vec<T>>* or rust_vec<T>* at runtime)
- Slices ([T] or [T]/&, pair of T* and length)

Of these, the notation for slices seems exactly right: it is short and 
the "/" suffix indicates a bound.  In fact, I think maybe we should 
change fn@() to fn/@() and so forth, and just have "/" be a trailing 
bound indicator.  That leaves fixed-length arrays and vectors to 
represent somehow.  And let's not forget strings, which just complicate 
everything.

So here is my overall proposal (best viewed in fixed width).  The 
comparison is between my proposal, Graydon's proposal, and an 
English-language description.  In some cases (such as ifaces), I have 
also integrated work on the type system I would like to do in the future.

     New type      Old type     Descr.
     --------      --------     ------
     fn(S) -> T    fn(S)
     fn/@(S) -> T  fn@(S) -> T
     fn/~(S) -> T  fn~(S) -> T

     :N [T]        [T]/N        fixed-length array
     [N]T          [T]/N        fixed-length array

     :[T]          N/A          (see below)
     @:[T]         [T]/@        boxed vec
     ~:[T]         [T]/~        unique vec

     [T]           [T]          slice
     [T]/&r        [T]/&r       slice with expl. region

     Id            Id           enum/class/resource/iface
     Id/&r         Id&r         ...with expl. region bound

     Id/@          Id@          iface with @ bound
     Id/~          Id~          iface with ~ bound

     str           str          slice
     str/&r        str/&r       slice with expl. region

     :N str        str/N        fixed-length str

     :str          N/A          (see below)
     @:str         str/@        boxed str
     ~:str         str/~        unique str

Explanation and rationale:

- A trailing slash always indicates a bound, meaning that it limits the 
types contained "within" the affected type.  Normally, the bound is a 
region.  In the case of opaque types (like fn and ifaces), this bound 
can also be @ or ~.

- The type `:N [T]` and `:N str`, corresponds to `T[N]` or `u8[N+1]` 
respectively.  That is, it is a "by-value" array.  If we want to allow N 
to be an arbitrary (const) expression, we may need to write `:(expr) 
[T]`, since `str` is no longer a keyword.

- Now everything which is in fact a pointer into the task/exchange heaps 
is prefixed with a @/~.

- The pseudo-type `:[T]` is supposed to look like "an array with an 
unspecified length".  It refers to a rust_vec<T> (by-value).   I say 
that it is a pseudo-type because you cannot write `:[T]` on its own.  In 
fact, it is not even a type.  You can only write `@:[T]` or `~:[T]`---we 
just use a bit of look-ahead.

The reason to keep `:[T]` from being a type is that it has unknown 
size.  To support this safely with generic types, we'd need to add 
kinds. I would like to do this eventually so that we can declare records 
with an inline vector at the end, but it's not necessary now.

I am not at all crazy about `:` prefix, I just couldn't come up with a 
better character.  I wanted `#` for number, but (a) it's in use by 
macros and (b) it's kind of heavy.  `*` (think: repeat) is used for 
unsafe ptrs.  `^` is random. `+` (again, repeat) looks like an infix 
operator, not a prefix operator.

Rejected ideas:

My original plan was "N:[T]" which I think looks way better than ":N 
[T]", but I scrapped it because `N` might eventually be a const 
expression and we need some clue that it's coming in the parser.

Another plan which I liked a lot was to have []T be slice, [N]T be 
constant length array, and [:]T or [.]T be  unknown length array.  I 
think this looks *great*, but there are two problems: First, I don't 
know how it extends to `str`.  Second, the region bound, if any, is 
ambiguous, so you'd need parentheses to clear it up: []T/&r could be 
[](T/&r) or ([]T)/&r.  But maybe that's ok as I don't expect explicit 
region bounds to appear very often at all.

Thoughts?


Niko


More information about the Rust-dev mailing list