[rust-dev] strings, slices and nulls
graydon at mozilla.com
Wed Apr 18 11:54:25 PDT 2012
Our current strings always have a trailing null. This fact is pretty
much solely for interop with C. It's convenient that when you grab a
pointer to the buffer storing the string, you get something
null-terminated that you can pass to C.
We accomplish this by setting the fill field of a string to 1 longer
than the number of bytes we're given, and writing a null to the
There are a couple places this is visible in the language, and a couple
new places where it'll surface with fixed-size and slice strings:
- the index operator  lets you index past the str::len
length. That is, x[str::len(x)] == 0 as uint, even though
the same thing fails as a bounds-overrun on a vec.
- If you _cast_ to a vec (rather than asking for a copy) you
get a vec that includes a trailing 0 byte.
- "hello"/5 is a "fixed size" string of str::len 5, but it costs
6 bytes of storage. And the type, say a str/5 in a structure,
will eat 6 bytes of contiguous storage.
- "hello"/& makes a slice, a (*u8,uint) pair. The uint field is
length, but it is also 6u, not 5u. That is, a slice always points
one byte beyond the "part that the user wants". This is to
support the idea of taking a slice from the middle of a string and
passing it to C: the library function that produces a *c_char will
have to look at the len'th byte, check for null, and make a
temporary copy if the slice doesn't "end in null" already.
(You only notice this if you manually unpack a slice to tuple form
and inspect it. asking for str::slice_len(s) will return 5 as with
any other "non-raw-pointers" view of a string)
Here are some possible paths forward:
1. Keep everything as-described here. It's perfect!
2. Fix  on strings to fault on s[str::len(s)], like vec.
3. Remove the null termination stuff altogether. Make all strings
(fixed-size, slice, unique, shared) work exactly like vecs in terms
of length, and _always_ make temporary copies that we manually null
terminate before passing to C.
My current thinking is #2 here. Fix the indexing operator to relate to
observable "length" the same way vec does, but otherwise try to
"preserve the illusion" that most strings can pass through to C "for
cheap", without making a copy. Only slices-to-the-middle-of-strings need
copies. Which should not be most slices.
More information about the Rust-dev