[rust-dev] RFC: "A 30 minute introduction to Rust"

Steve Klabnik steve at steveklabnik.com
Mon Jan 13 09:08:02 PST 2014


Also posted to my blog:
http://words.steveklabnik.com/a-30-minute-introduction-to-rust

I've just kept this in Markdown even though the email is in plain
text, should still be easy on the eyes.

I recently gave a [proposal for Rust's
documentation](https://air.mozilla.org/rust-meetup-december-2013/). An
important component of my overall proposal is a short, simple
introduction for people who may have heard of Rust, so that they can
figure out if Rust is right for them. The other day, I saw [this
excellent presentation](http://www.youtube.com/watch?v=gfCtbGiHcg0),
and thought it might serve as a great basis for this introduction.
Consider this an RFC for such an introduction. Feedback very welcome.
If you all like it, I'll submit it to our documentation.

---------------------------------------------------------------------

Rust is a systems programming language that focuses on strong
compile-time correctness guarantees. It improves upon the ideas other
systems languages like C++, D, and Cyclone by providing very strong
guarantees and explicit control over the life cycle of memory. Strong
memory guarantees make writing correct concurrent Rust code easier
than in other languages. This might sound very complex, but it's
easier than it sounds! This tutorial will give you an idea of what
Rust is like in about thirty minutes. It expects that you're at least
vaguely familiar with a previous 'curly brace' language. The concepts
are more important than the syntax, so don't worry if you don't get
every last detail: the
[tutorial](http://static.rust-lang.org/doc/master/tutorial.html) can
help you out with that later.

Let's talk about the most important concept in Rust, "ownership," and
its implications on a task that programmers usually find very
difficult: concurrency.

## Ownership

Ownership is central to Rust, and is one of its more interesting and
unique features. "Ownership" refers to which parts of your code are
allowed to modify various parts of memory. Let's start by looking at
some C++ code:

```
int *dangling(void)
{
    int i = 1234;
    return &i;
}

int add_one(void)
{
    int *num = dangling();
    return *num + 1;
}
```

This function allocates an integer on the stack, and stores it in a
variable, `i`. It then returns a reference to the variable `i`.
There's just one problem: stack memory becomes invalid when the
function returns. This means that in the second line of `add_one`,
`num` points to some garbage values, and we won't get the effect that
we want. While this is a trivial example, it can happen quite often in
C++ code. There's a similar problem when memory on the heap is
allocated with `malloc` (or `new`), then freed with `free` (or
`delete`), yet your code attempts to do something with the pointer to
that memory. More modern C++ uses RAII with constructors/destructors,
but it amounts to the same thing. This problem is called a 'dangling
pointer,' and it's not possible to write Rust code that has it. Let's
try:

```
fn dangling() -> &int {
    let i = 1234;
    return &i;
}

fn add_one() -> int {
    let num = dangling();
    return *num + 1;
}
```

When you try to compile this program, you'll get an interesting (and
long) error message:

```
temp.rs:3:11: 3:13 error: borrowed value does not live long enough
temp.rs:3     return &i;

temp.rs:1:22: 4:1 note: borrowed pointer must be valid for the
anonymous lifetime #1 defined on the block at 1:22...
temp.rs:1 fn dangling() -> &int {
temp.rs:2     let i = 1234;
temp.rs:3     return &i;
temp.rs:4 }

temp.rs:1:22: 4:1 note: ...but borrowed value is only valid for the
block at 1:22
temp.rs:1 fn dangling() -> &int {
temp.rs:2     let i = 1234;
temp.rs:3     return &i;
temp.rs:4  }
error: aborting due to previous error
```

In order to fully understand this error message, we need to talk about
what it means to "own" something. So for now, let's just accept that
Rust will not allow us to write code with a dangling pointer, and
we'll come back to this code once we understand ownership.

Let's forget about programming for a second and talk about books. I
like to read physical books, and sometimes I really like one and tell
my friends they should read it. While I'm reading my book, I own it:
the book is in my possession. When I loan the book out to someone else
for a while, they "borrow" it from me. And when you borrow a book,
it's for a certain period of time, and then you give it back to me,
and I own it again. Right?

This concept applies directly to Rust code as well: some code "owns" a
particular pointer to memory. It's the sole owner of that pointer. It
can also lend that memory out to some other code for a while: the code
"borrows" it. It borrows it for a certain period of time, called a
"lifetime."

That's all there is to it. That doesn't seem so hard, right? Let's go
back to that error message: `error: borrowed value does not live long
enough`. We tried to loan out a particular variable, `i`, using Rust's
borrowed pointers: the `&`. But Rust knew that the variable would be
invalid after the function returns, and so it tells us that: `borrowed
pointer must be valid for the anonymous lifetime #1... but borrowed
value is only valid for the block`. Neat!

That's a great example for stack memory, but what about heap memory?
Rust has a second kind of pointer, a 'unique' pointer, that you can
create with a `~`. Check it out:

```
fn dangling() -> ~int {
    let i = ~1234;
    return i;
}

fn add_one() -> int {
    let num = dangling();
    return *num + 1;
}
```

This code will successfully compile. Note that instead of a stack
allocated `1234`, we use an owned pointer to that value instead:
`~1234`. You can roughly compare these two lines:

```
// rust
let i = ~1234;

// C++
int *i = new int;
*i = 1234;
```

Rust is able to infer the size of the type, then allocates the correct
amount of memory and sets it to the value you asked for. This means
that it's impossible to allocate uninitialized memory: Rust does not
have the concept of null. Hooray! There's one other difference between
this line of Rust and the C++: The Rust compiler also figures out the
lifetime of `i`, and then inserts a corresponding `free` call after
it's invalid, like a destructor in C++. You get all of the benefits of
manually allocated heap memory without having to do all the
bookkeeping yourself. Furthermore, all of this checking is done at
compile time, so there's no runtime overhead. You'll get (basically)
the exact same code that you'd get if you wrote the correct C++, but
it's impossible to write the incorrect version, thanks to the
compiler.

You've seen one way that ownership and lifetimes are useful to prevent
code that would normally be dangerous in a less-strict language, but
let's talk about another: concurrency.

## Concurrency

Concurrency is an incredibly hot topic in the software world right
now. It's always been an interesting area of study for computer
scientists, but as usage of the Internet explodes, people are looking
to improve the number of users a given service can handle. Concurrency
is one way of achieving this goal. There is a pretty big drawback to
concurrent code, though: it can be hard to reason about, because it is
non-deterministic. There are a few different approaches to writing
good concurrent code, but let's talk about how Rust's notions of
ownership and lifetimes can assist with achieving correct but
concurrent code.

First, let's go over a simple concurrency example in Rust. Rust allows
you to spin up 'tasks,' which are lightweight, 'green' threads. These
tasks do not have any shared memory, and so, we communicate between
tasks with a 'channel'. Like this:

```
fn main() {
    let numbers = [1,2,3];

    let (port, chan)  = Chan::new();
    chan.send(numbers);

    do spawn {
        let numbers = port.recv();
        println!("{:d}", numbers[0]);
    }
}
```

In this example, we create a vector of numbers. We then make a new
`Chan`, which is the name of the package Rust implements channels
with. This returns two different ends of the channel: a channel and a
port. You send data into the channel end, and it comes out the port
end. The `spawn` function spins up a new task. As you can see in the
code, we call `port.recv()` (short for 'receive') inside of the new
task, and we call `chan.send()` outside, passing in our vector. We
then print the first element of the vector.

This works out because Rust copies the vector when it is sent through
the channel. That way, if it were mutable, there wouldn't be a race
condition. However, if we're making a lot of tasks, or if our data is
very large, making a copy for each task inflates our memory usage with
no real benefit.

Enter Arc. Arc stands for 'atomically reference counted,' and it's a
way to share immutable data between multiple tasks. Here's some code:

```
extern mod extra;
use extra::arc::Arc;

fn main() {
    let numbers = [1,2,3];

    let numbers_arc = Arc::new(numbers);

    for num in range(0, 3) {
        let (port, chan)  = Chan::new();
        chan.send(numbers_arc.clone());

        do spawn {
            let local_arc = port.recv();
            let task_numbers = local_arc.get();
            println!("{:d}", task_numbers[num]);
        }
    }
}
```

This is very similar to the code we had before, except now we loop
three times, making three tasks, and sending an `Arc` between them.
`Arc::new` creates a new Arc, `.clone()` makes a new reference to that
Arc, and `.get()` gets the value out of the Arc. So we make a new
reference for each task, send that reference down the channel, and
then use the reference to print out a number. Now we're not copying
our vector.

Arcs are great for immutable data, but what about mutable data? Shared
mutable state is the bane of the concurrent programmer. You can use a
mutex to protect shared mutable state, but if you forget to acquire
the mutex, bad things can happen.

Rust provides a tool for shared mutable state: `RWArc`. This variant
of an Arc allows the contents of the Arc to be mutated. Check it out:

```
extern mod extra;
use extra::arc::RWArc;

fn main() {
    let numbers = [1,2,3];

    let numbers_arc = RWArc::new(numbers);

    for num in range(0, 3) {
        let (port, chan)  = Chan::new();
        chan.send(numbers_arc.clone());

        do spawn {
            let local_arc = port.recv();

            local_arc.write(|nums| {
                nums[num] += 1
            });

            local_arc.read(|nums| {
                println!("{:d}", nums[num]);
            })
        }
    }
}
```

We now use the `RWArc` package to get a read/write Arc. The read/write
Arc has a slightly different API than `Arc`: `read` and `write` allow
you to, well, read and write the data. They both take closures as
arguments, and the read/write Arc will, in the case of write, acquire
a mutex, and then pass the data to this closure. After the closure
does its thing, the mutex is released.

You can see how this makes it impossible to mutate the state without
remembering to aquire the lock. We gain the efficiency of shared
mutable state, while retaining the safety of disallowing shared
mutable state.

But wait, how is that possible? We can't both allow and disallow
mutable state. What gives?

## A footnote: unsafe

So, the Rust language does not allow for shared mutable state, yet I
just showed you some code that has it. How's this possible? The
answer: `unsafe`.

You see, while the Rust compiler is very smart, and saves you from
making mistakes you might normally make, it's not an artificial
intelligence. Because we're smarter than the compiler, sometimes, we
need to over-ride this safe behavior. For this purpose, Rust has an
`unsafe` keyword. Within an `unsafe` block, Rust turns off many of its
safety checks. If something bad happens to your program, you only have
to audit what you've done inside `unsafe`, and not the entire program
itself.

If one of the major goals of Rust was safety, why allow that safety to
be turned off? Well, there are really only three main reasons to do
it: interfacing with external code, such as doing FFI into a C
library, performance (in certain cases), and to provide a safe
abstraction around operations that normally would not be safe. Our
Arcs are an example of this last purpose. We can safely hand out
multiple references to the `Arc`, because we are sure the data is
immutable, and therefore it is safe to share. We can hand out multiple
references to the `RWArc`, because we know that we've wrapped the data
in a mutex, and therefore it is safe to share. But the Rust compiler
can't know that we've made these choices, so _inside_ the
implementation of the Arcs, we use `unsafe` blocks to do (normally)
dangerous things. But we expose a safe interface, which means that the
Arcs are impossible to use incorrectly.

This is how Rust's type system allows you to not make some of the
mistakes that make concurrent programming difficult, yet get the
efficiency of languages such as C++.

## That's all, folks

I hope that this taste of Rust has given you an idea if Rust is the
right language for you. If that's true, I encourage you to check out
[the tutorial](http://rust-lang.org//doc/tutorial.html) for a full,
in-depth exploration of Rust's syntax and concepts.


More information about the Rust-dev mailing list