[rust-dev] Unicode vs hex escapes in Rust

Graydon Hoare graydon at mozilla.com
Wed Jul 4 11:53:57 PDT 2012


On 12-07-04 6:55 AM, Behdad Esfahbod wrote:

> I started learning Rust today, and while it's building, I'm going through the
> tutorial.  I understand that it's quite outdated (points to 0.1 still), but
> here's a couple things I spotted:
>
> Under 3.6.2 Other literals:
>
>    * This: "A newline (Unicode character 32)" should say 10 instead of 32!

Oops! Fixed. Sharp eyes.

>    * Here: "\xHH, \uHHHH, \UHHHHHHHH Unicode escapes", I strongly suggest that
> \xHH be modified to allow inputting direct UTF-8 bytes.  For ASCII it doesn't
> make any different.  For Latin1, it gives the impression that strings are
> stored in Latin1, which is not the case.  It would also make C / Python
> escaped strings directly usable in Rust.  Ie. '\xE2\x98\xBA' would be a single
> character equivalent to '\u263a', not three Latin1 characters.

Heh. This is interesting! I hadn't noticed yet but you're not _entirely_ 
giving the whole story.

   - \xNN means a utf8 byte: python2, python3 'bytes' literals,
     perl, go, C, C++, C++-0x u8 literals, and ruby

   - \xNN means a unicode codepoint: python3 'string' literals,
     javascript, scheme (at least racket follows spec; others
     get it randomly wrong by implementation), and current rust.

   - \xNN illegal, but the octal version means a unicode codepoint:
     java.

So, my inclination is to follow your suggestion and actually go with the 
C and C++ style. But it's not exactly universal!

Filed as bug #2800. It's quick, I might pick it off today.

>    * This: "loop is the preferred way of writing while true" doesn't match the
> example that follows it, which still uses "while true".

Already fixed in the tutorial online:

http://dl.rust-lang.org/doc/tutorial.html#loops

Thanks for the sharp feedback, keep it coming :)

-Graydon


More information about the Rust-dev mailing list