[rust-dev] First thoughts on Rust

Patrick Walton pwalton at mozilla.com
Mon Jan 23 09:11:19 PST 2012

On 01/22/2012 03:11 AM, Masklinn wrote:
> * The first one is the apparent (community) usage of "blocks" for
>    Rust's boxed closures[0]. My issue with this is that languages where
>    blocks are first-class objects (Smalltalk, Ruby) default to
>    non-local returns from these blocks. Rust does not — as far as I can
>    tell — have — let alone use — non-local returns.
>    Using "block" for boxed closures does everybody a disservice as it
>    makes transition much harder and *will* disappoint people used to
>    actual smalltalk-type blocks. The tutorial does not have this issue,
>    which is good, but the community should be careful. Talking about
>    lambdas or sugared lambdas would probably be a good idea (unless
>    Rust is modified to handle and default to non-local returns from
>    stack closures)

Nobody could find a way to implement non-local returns in a performant 
way, so they're missing. There's been discussion of a magic "loopctl" 
enum type as a return type for blocks, which would allow non-local 
returns with cooperation between the loop function and the block.

> * The second issue is both trivial and extremely serious: after
>    having written a few trivial pieces of code (can't call them
>    programs), it feels like Rust's handling of semicolons combines
>    the simplicity of Erlang's with the determinism of
>    Javascript's.
>    I think the core issue is that Rust uses semicolons as expression
>    separators where most if not all other braceful languages use them
>    as expression terminators. I know the rules are simple to express,
>    but Erlang demonstrated it would *never* feel right to people in the
>    current state of the language. Erlang also has the issue of three
>    different separators, but Rust has the additional issue that a
>    semicolon becomes the difference between returning a value and
>    returning unit. That's not right.
>    I see three possible ways to fix this:
>    - Don't change Rust's semantics, but change Rust's "blessed" style,
>      by prefixing expressions with semicolons (instead of
>      post-fixing). This is a common style for e.g. RelaxNG-Compact
>      schemas and it looks "correct" for separators
>    - Make semicolons into expression terminators, as in the majority of
>      C-style languages
>    - Add haskell-like layout rules making semicolons redundant in 99.9%
>      of cases, and necessary only when putting multiple expressions on
>      a line or when generating rust code mechanically. This would fix
>      the issue by making semicolons generally unnecessary and thus not
>      a source of error

I like OCaml's solution, which is to simply ignore trailing semicolons. 
In order to do this we need to reform the standard library to avoid 
returning values that are unused (which is a good idea anyway, so this 
is a lesser problem). But IIRC when this was tried, there were parsiing 
ambiguities involving loops and sugared block syntax (basically, the set 
of expressions that automatically infer a trailing semicolon). We would 
have to solve those issues somehow.

Options (1) and (3) are basically changing the entire syntactic family 
that Rust belongs to. I don't agree that this is a serious enough issue 
to warrant that.

> * Strings. I believe Rust's current state of conflating byte sequences
>    and human-text strings to be as big a mistake as it was in Python.
>    If Rust wants to be mainly bytes-oriented the `str` type should be
>    renamed `bytes` and most string-manipulation functions should be
>    removed.
>    Otherwise, I believe it should be split in two clearly separate
>    types, one manipulating sequences or arrays of bytes and the other
>    one manipulating streams of unicode codepoints (for which encoding
>    would be an implementation detail).
>    I think the current setting will hurt both Rust and its users in the
>    long term. One of the first problems being that a byte sequence
>    advertising itself as UTF-8 and actually UTF-8 text have different
>    properties and overlong UTF-8 (which can be found in byte sequences)
>    is an actual security issue[1]. Bytes to string *should* require
>    transcoding, even if that transcoding does nothing more than an O(n)
>    shortest-form assertion because the string type's underlying
>    representation is UTF-8 anyway.

It was always the intention to do things more or less the way you 
suggest (str is Unicode-correct, [u8] is not Unicode-correct), although 
I'm not sure whether we want to make libicu a dependency of every Rust 
program. The [u8]->str conversion function is called "unsafe_from_bytes" 
for this reason, for example -- it's unsafe because it's not calling 
into libicu to perform the conversion.

In other words, this is mostly a standard library hygiene issue.

> * Finally, I could not find any good information on the result of loop
>    expressions, from my tests it seems to always be `()` is that
>    correct? If so, why `()` rather than the last-evaluated result of
>    the iteration? In case no iteration at all is performed?

Yes. Rust matches OCaml here.


More information about the Rust-dev mailing list