[rust-dev] Syntax of vectors, slices, etc

Joe Groff arcata at gmail.com
Tue Apr 24 11:49:56 PDT 2012


On Tue, Apr 24, 2012 at 11:30 AM, Matthieu Monrocq
<matthieu.monrocq at gmail.com> wrote:
> However this is at the condition of considering strings as list of
> codepoints, and not list of bytes. List of bytes are useful in encoding and
> decoding operations, but to manipulate Arabic or Korean, they fall short:
> having users manipulate the strings byte-wise instead of codepoint-wise is a
> recipe to disaster outside of English and Latin-1 representable languages.
>
> I understand that this may seem contradictory to Rust's original direction
> of utf-8 encoded strings, but having worked with utf-8 strings using C++
> `std::string` I can assure you that apart from blindly passing them around,
> one cannot do much. All modifiying operations require the use of Unicode
> aware libraries... even `substr`.

Well, that's why you should use ICU instead of builtin language
facilities for Unicode-aware processing. But there's a lot of code
that really does just need to blindly pass around pre-composed
strings, and an ICU or equivalent dependency (and in many cases even
UTF encoding/decoding) would be overkill for those applications. In
previous discussions about text processing on the list, IIRC it's been
decided that the builtin string facilities should remain low-level,
and bindings to ICU used for real text processing.

-Joe


More information about the Rust-dev mailing list