[rust-dev] Syntax of vectors, slices, etc

Graydon Hoare graydon at mozilla.com
Mon Apr 23 16:12:06 PDT 2012


On 12-04-23 03:21 PM, Rick Richardson wrote:
> Should a str be subject to the same syntax? Because it will have
> different semantics.

I think the semantics are almost identical to vectors. Save the null issue.

> A UTF-8 string  has differently sized characters, so you can't treat
> it as a vector, there are obvious and currently discussed
> interoperability issues regarding the null terminator.

You certainly can treat it as a (constrained) vector. It's just a byte
vector, not a character vector. A character vector is [char]. Indexing
into a str gives you a byte. You can iterate through it in terms of
bytes or characters (or words, lines, paragraphs, etc.) or convert to
characters or utf-16 code units or any other encoding of unicode.

> It should definitely get a slice syntax, since that will likely be the
> most common operation on a string.
> I would also like to support a notion of static sizing, but with UTF-8
> even that's not always possible.

Yes it is. The static size is a byte count. The compiler knows that size
statically and can complain if you get it wrong (or fill it in if you
leave it as a wildcard, as I expect most will do.)

> I reckon a string should be an object, and potentially be convertible
> to/from a vector.  But trying to treat it like a vector will just lead
> to surprising semantics for some.  But that's just my opinion.

The set of use-cases to address simultaneously is large and covers much
of the same ground as vectors:

  - Sometimes people want to be able to send strings between tasks.
  - Sometimes people want a shared, refcounted string.
  - Sometimes people want strings of arbitrary length.
  - Sometimes people want an interior string that's part of another
    structure (with necessarily-fixed size), copied by value.
  - String literals exist and ought to turn into something useful,
    something in static memory when possible, dynamic otherwise.
  - Passing strings and substrings should be cheap, cheaper than
    refcount-adjustment even (when possible).

As far as I know, our class system can't really satisfy these
requirements. This is why they're a built-in type (just like vectors).
To make the class system strong enough to do all those things would be
much more work, and would be approaching more like the C++0x model,
which I believe to be over-engineered in pursuit of the "make libraries
able to do anything a built in type can do" goal.

But reasonable people disagree on this.

-Graydon


More information about the Rust-dev mailing list