[rust-dev] strings, slices and nulls

Graydon Hoare graydon at mozilla.com
Wed Apr 18 11:54:25 PDT 2012


Our current strings always have a trailing null. This fact is pretty
much solely for interop with C. It's convenient that when you grab a
pointer to the buffer storing the string, you get something
null-terminated that you can pass to C.

We accomplish this by setting the fill field of a string to 1 longer
than the number of bytes we're given, and writing a null to the
str[fill] index.

There are a couple places this is visible in the language, and a couple
new places where it'll surface with fixed-size and slice strings:


   - the index operator [] lets you index past the str::len
     length. That is, x[str::len(x)] == 0 as uint, even though
     the same thing fails as a bounds-overrun on a vec.

   - If you _cast_ to a vec (rather than asking for a copy) you
     get a vec that includes a trailing 0 byte.


   - "hello"/5 is a "fixed size" string of str::len 5, but it costs
     6 bytes of storage. And the type, say a str/5 in a structure,
     will eat 6 bytes of contiguous storage.

   - "hello"/& makes a slice, a (*u8,uint) pair. The uint field is
     length, but it is also 6u, not 5u. That is, a slice always points
     one byte beyond the "part that the user wants". This is to
     support the idea of taking a slice from the middle of a string and
     passing it to C: the library function that produces a *c_char will
     have to look at the len'th byte, check for null, and make a
     temporary copy if the slice doesn't "end in null" already.

     (You only notice this if you manually unpack a slice to tuple form
      and inspect it. asking for str::slice_len(s) will return 5 as with
      any other "non-raw-pointers" view of a string)

Here are some possible paths forward:

 1. Keep everything as-described here. It's perfect!

 2. Fix [] on strings to fault on s[str::len(s)], like vec.

 3. Remove the null termination stuff altogether. Make all strings
    (fixed-size, slice, unique, shared) work exactly like vecs in terms
    of length, and _always_ make temporary copies that we manually null
    terminate before passing to C.

My current thinking is #2 here. Fix the indexing operator to relate to
observable "length" the same way vec does, but otherwise try to
"preserve the illusion" that most strings can pass through to C "for
cheap", without making a copy. Only slices-to-the-middle-of-strings need
copies. Which should not be most slices.

Other opinions?


More information about the Rust-dev mailing list