[rust-dev] Interface / typeclass proposal

Marijn Haverbeke marijnh at gmail.com
Fri Nov 25 08:06:52 PST 2011


Niko proposed categories [1] two weeks ago. I'm happy that we're
looking in this direction. Niko's proposal makes interfaces
structural. I'm going to argue that nominal interfaces have advantages
both in programmer ergonomics and in compilation-model complexity.

[1]: https://mail.mozilla.org/pipermail/rust-dev/2011-November/000941.html

I'll be sticking rather closely to Haskell's type class system, which
has proven itself in practice. If you aren't already enthusiastic
about Haskell's type classes, I recommend watching Simon Peyton Jones'
talk about them [2]. He goes over the way type classes can be
implemented, and shows a number of really cool applications.

[2]: http://channel9.msdn.com/posts/MDCC-TechTalk-Classes-Jim-but-not-as-we-know-them
(try to skip the first 3 minutes, they might spoil your appetite)

Context: I'd like to implement some minimum viable dynamic dispatch
system and get rid of our `obj` implementation before the first public
release. Something that can be extended later with classes and traits,
but for now just allows us to define vtables that accompany types.

To recap, Niko's categories look like this:

    category vec_seq([T]) {
      fn len() -> uint { vec::len(self) }
      fn iter(f: block(T)) { for elt in self { f(elt); } }
    }

Having that, you can do

    import vec::vec_seq;
    assert [1].len() == 1u;
    [1, 2, 3].iter({|i| log i; })

Which is statically resolved. Dynamic dispatch (if I understand
correctly), would look like this:

    type iterable<T> = iface { fn iter(block(T)); };
    fn append_to_vec<copy T>(x: iterable<T>, y: itable<T>) -> [T] {
        let r = [];
        x.iter {|e| r += [e];}
        y.iter {|e| r += [e];}
        r
    }
    // Assuming there exists a category for lists that implements iter
    append_to_vec([1, 2, 3] as iterable, list::cons(4, list::nil) as iterable)

That causes the compiler to create two vtables, both containing an
`iter` method, and wrap the arguments in {vtable, @value} structures
when they are cast to `iterable` (they'll probably have to be boxed to
make sure the size of such a value is uniform, and cleanup is
straightforward).

Alternatively, my proposal looks like this: interfaces could be fixed
groups of methods, that are always implemented all at once.

    // Define an interface called `seq`
    interface seq<T> {
        fn len() -> uint;
        fn iter(f: block(T));
    }
    // Declare [T] to be an instance of seq
    instance [T] seq<T> {
        fn len() -> uint { vec::len(self) }
        fn iter(f: block(T)) { for elt in self { f(elt); } }
    }

The static way to use this would look the same as above. If you've
imported `vec::seq` (std::vec's implementation of seq), you can simply
say [1].len(). If there is any instance in scope that applies to type
[int] and has a method `len`, that instance's implementation is
called. If multiple interfaces are found, the one in the closes scope
is chosen. If they are in the same scope, or no interface is found,
you get a compiler error.

Dynamic dispatch works differently.

    // Declare T to be an instance of the seq interface
    fn total_len<T:seq>(seqs: [T]) -> uint {
        let cnt = 0u;
        for s in seqs { count += s.len(); }
        count
    }

In this proposal, the seq vtable is not something that get attached to
the value by casting it to an interface, but rather acts as an
implicit parameter to the function. The cool thing is that we already
have these implicit parameters -- they map very closely to our type
descriptors, which we are implicitly passing to generics. What would
happen, for such a call, is that the compiler notices that the type
parameter has an interface bound, so that instead of passing just a
tydesc, it passes both a tydesc and the `seq` vtable that belongs to
that type. Inside the function, `s` is known to be of type `T:seq`, so
the `s.len` call looks up the `len` method in the vtable passed for
type parameter T.

You can also require type parameter to conform to multiple interfaces,
as in `fn foo<T: seq, print>(...)` -- that requires passing multiple
vtables. (Niko: this is the thing you asked about. Turns out it's not
hard to do.)

It should be noted that this has both advantages and disadvantages
compared to the 'wrap by casting to interface approach'. For one
thing, it doesn't allow this

    fn draw_all<T: drawable>(shapes: [T]) { for s in shapes { s.draw(); } }

.. or at least, it doesn't do what you want, because it requires all
arguments to be of the same type, and only passes a single vtable. An
extension can be implemented (at a later time), to support this
instead:

    fn draw_all(shapes: [drawable]) { ... }
    draw_all([my_circle as drawable, my_rectangle as drawable]);

The drawable interface, when used as a type instead of a type
parameter bound, would denote a boxed value paired with a vtable, just
like in Niko's proposal.

And the good part: In the case where the interface is used as a type
parameter bound, which should cover most use cases, things do not have
to be boxed to be handled generically, and the content of regular data
structures (such as `[int]`) can be approached generically. This is
fast, and it allows type classes to applied all over the place...

Everything that's currently an obj could become an instance. We'd get
static, super-fast dispatch when using them monomorphically, and be
able to decide on our own representation (rather than being locked
into a boxed representation, as objs are) for the values. Being able
to define methods on built-in types means that many things wouldn't
require defining a new representation at all.

The operations that are currently magically implemented by the
compiler and runtime, such as comparing and logging, could be methods
on interfaces (see Haskell's Eq, Cmp, and Show type classes). That'd
make them overridable and extendable.

With a Sufficiently Smart Inliner, we could even do arithmetic with
methods, and get operator overloading for free (see Haskell's Num type
class), and allow things like generic implementations of sum, average,
and similar numeric operations over sequences of `T: num` type.

Our type parameter bounds, `copy` and `send`, could become interfaces,
with suitable implementations in the standard library. Copying of
generic types would be forbidden, but `copy` would be a method of the
`copy` typeclass, so you could say `copy(elt)` if you declared your
type parameter with a `: copy` bound.

Implementing a system like this in its simple form is not terribly
hard, especially since we already have tydescs implemented. Making it
as powerful as Haskell's requires some extensions (most importantly:
default implementations of methods, and interfaces that are
parameterized with other interfaces), but those can be tackled
piecemeal. There's even a credible path towards multiple dispatch
(methods dispatching on the type of more than one argument), though it
requires a more complicated interface-dispatching mechanism.


I'd like to spend next week implementing this. Comments, additions,
and violently disagreeing flames are welcome.

Cheers,
Marijn


More information about the Rust-dev mailing list