[rust-dev] Unicode vs hex escapes in Rust
graydon at mozilla.com
Wed Jul 4 11:53:57 PDT 2012
On 12-07-04 6:55 AM, Behdad Esfahbod wrote:
> I started learning Rust today, and while it's building, I'm going through the
> tutorial. I understand that it's quite outdated (points to 0.1 still), but
> here's a couple things I spotted:
> Under 3.6.2 Other literals:
> * This: "A newline (Unicode character 32)" should say 10 instead of 32!
Oops! Fixed. Sharp eyes.
> * Here: "\xHH, \uHHHH, \UHHHHHHHH Unicode escapes", I strongly suggest that
> \xHH be modified to allow inputting direct UTF-8 bytes. For ASCII it doesn't
> make any different. For Latin1, it gives the impression that strings are
> stored in Latin1, which is not the case. It would also make C / Python
> escaped strings directly usable in Rust. Ie. '\xE2\x98\xBA' would be a single
> character equivalent to '\u263a', not three Latin1 characters.
Heh. This is interesting! I hadn't noticed yet but you're not _entirely_
giving the whole story.
- \xNN means a utf8 byte: python2, python3 'bytes' literals,
perl, go, C, C++, C++-0x u8 literals, and ruby
- \xNN means a unicode codepoint: python3 'string' literals,
get it randomly wrong by implementation), and current rust.
- \xNN illegal, but the octal version means a unicode codepoint:
So, my inclination is to follow your suggestion and actually go with the
C and C++ style. But it's not exactly universal!
Filed as bug #2800. It's quick, I might pick it off today.
> * This: "loop is the preferred way of writing while true" doesn't match the
> example that follows it, which still uses "while true".
Already fixed in the tutorial online:
Thanks for the sharp feedback, keep it coming :)
More information about the Rust-dev