[rust-dev] strings, slices and nulls
graydon at mozilla.com
Thu Apr 19 12:59:07 PDT 2012
On 12-04-19 07:25 AM, Jesse Ruderman wrote:
> My preference is to remove null termination:
> * I'm guessing most strings aren't passed to C. (What are the most
> common C string calls in rustc?)
All the filesystem access stuff, at this point. In the future it's
harder to say.
> * C functions that scan for null are inefficient, so they're even more
> likely to be replaced with Rust equivalents than other C functions.
Hm, I think this is not a reasonable stance:
$ find /usr/include/ -name \*.h \
| xargs cat \
| grep -c 'char\( *const\)\? *\*'
There are a lot of C APIs that take strings. "Rewrite the world in rust"
is going to take a long time.
> * Null termination is not sufficient for interop with C. You also have
> to ensure the strings don't contain null characters. (This is a common
> the network can contain null characters.) And if null characters are
> present, what do you do?
I can see some cases where that might be a bug, but in general I think
an embedded null just ... makes a string shorter, from C's perspective.
It's the same as passing a short string. Of course if the C code
requires some other kind of well-formedness condition in the prefix,
you'd need to enforce that, but that condition presumably holds over
shorter and longer strings alike. Most C APIs aren't written to take
strings of a fixed size.
> * Each C function has its own expectations about character encoding
> and allowed characters, so calls to C involve extra state-tracking or
> checks anyway.
For APIs that take UTF-16, such as the win32 APIs, we already do the
conversion before calling, yes. But for APIs that take "char *" they
tend to be set up so they can accept UTF-8 input: they're either
agnostic to the differences between ASCII and UTF-8 (as UTF-8 was
designed to exploit) or else they can operate in UTF-8 mode via LC_CTYPE
or such. Sure you need to either enforce that and/or re-encode when it's
not true, but again, this is about opportunistic recoding-avoidance by
careful choice of defaults, rather than a guarantee that we never need
Sometimes users want an array of UCS4 as well, but it's not our default
More information about the Rust-dev