[rust-dev] strings, slices and nulls

Brian Anderson banderson at mozilla.com
Wed Apr 18 12:17:53 PDT 2012


On 04/18/2012 11:54 AM, Graydon Hoare wrote:
> Hi,
>
> Our current strings always have a trailing null. This fact is pretty
> much solely for interop with C. It's convenient that when you grab a
> pointer to the buffer storing the string, you get something
> null-terminated that you can pass to C.
>
> We accomplish this by setting the fill field of a string to 1 longer
> than the number of bytes we're given, and writing a null to the
> str[fill] index.
>
> There are a couple places this is visible in the language, and a couple
> new places where it'll surface with fixed-size and slice strings:
>
>    existing:
>
>     - the index operator [] lets you index past the str::len
>       length. That is, x[str::len(x)] == 0 as uint, even though
>       the same thing fails as a bounds-overrun on a vec.

This sounds like a bug. I've never encountered this before.

>
>     - If you _cast_ to a vec (rather than asking for a copy) you
>       get a vec that includes a trailing 0 byte.

This we rely on extensively, but hopefully most places that makes use of 
this fact do it via str::as_bytes.

>
>    new:
>
>     - "hello"/5 is a "fixed size" string of str::len 5, but it costs
>       6 bytes of storage. And the type, say a str/5 in a structure,
>       will eat 6 bytes of contiguous storage.
>
>     - "hello"/&  makes a slice, a (*u8,uint) pair. The uint field is
>       length, but it is also 6u, not 5u. That is, a slice always points
>       one byte beyond the "part that the user wants". This is to
>       support the idea of taking a slice from the middle of a string and
>       passing it to C: the library function that produces a *c_char will
>       have to look at the len'th byte, check for null, and make a
>       temporary copy if the slice doesn't "end in null" already.
>
>       (You only notice this if you manually unpack a slice to tuple form
>        and inspect it. asking for str::slice_len(s) will return 5 as with
>        any other "non-raw-pointers" view of a string)
>
> Here are some possible paths forward:
>
>   1. Keep everything as-described here. It's perfect!
>
>   2. Fix [] on strings to fault on s[str::len(s)], like vec.

Yes please.

>
>   3. Remove the null termination stuff altogether. Make all strings
>      (fixed-size, slice, unique, shared) work exactly like vecs in terms
>      of length, and _always_ make temporary copies that we manually null
>      terminate before passing to C.
>
> My current thinking is #2 here. Fix the indexing operator to relate to
> observable "length" the same way vec does, but otherwise try to
> "preserve the illusion" that most strings can pass through to C "for
> cheap", without making a copy. Only slices-to-the-middle-of-strings need
> copies. Which should not be most slices.

I agree with this. If we change str::as_bytes to copy as needed then 
most code should not be affected. The documentation for 
as_bytes/buf/c_str should reflect this though because it's sneaky.

I would kind of like for Rust strings not to expose the fact that they 
are null-terminated (or just not be null terminated) but it seems 
unavoidable.





More information about the Rust-dev mailing list