[rust-dev] Misc questions and ideas

Patrick Walton pwalton at mozilla.com
Sun Dec 23 09:20:07 PST 2012


On 12/23/12 10:43 AM, Michael Neumann wrote:
> Hi,
>
> I've spent the last days hacking in Rust and a few questions and ideas
> have accumulated over that time.
>
> * If I use unique ~pointers, there is absolutely no runtime overhead,
>    so neither ref-counting nor GC is involved, right?

Well, you have to malloc and free, but I assume you aren't counting 
that. There is no reference counting or GC, and the GC is totally 
unaware of such pointers.

(There is one caveat: when we have tracing GC, it must scan ~ pointers 
that contain @ pointers, just as it must scan the stack. Such pointers 
are generally uncommon though.)

> * Heap-allocated pointers incur ref-counting. So when I pass a
>    @pointer, I will basically pass a
>
>      struct heap_ptr {ptr: *byte, cnt: uint}
>
>    around. Right?

They currently do thread-unsafe reference counting, but we would like to 
eventually change that to tracing GC. However, the structure is 
different: we use intrusive reference counting, so it's actually a 
pointer to this structure:

pointer --> [ ref count, type_info, next alloc, prev alloc, data... ]

You're only passing one word around, not two. The reference count is 
inside the object pointed to. This setup saves one allocation over C++ 
std::shared_ptr.

> * vec::build_sized() somehow seems to be pretty slow. When I use it,
>    instead of a for() loop, my rust-msgpack library slows down by
>    factor 2 for loading msgpack data.
>
>    Also, I would have expected that vec::build_sized() will call my
>    supplied function "n" times. IMHO the name is little bit
>    misleading here.

You want vec::from_fn() instead. vec::build_sized() is not commonly used 
and could probably be renamed without too much trouble.

I suspect the performance problem you're seeing with it is due to not 
supplying enough LLVM inline hints. LLVM's inline heuristics are not 
well tuned to Rust at the moment; we work around it by writing 
#[inline(always)] in a lot of places, but we should probably have the 
compiler insert those automatically for certain uses of higher-order 
functions. When LLVM inlines properly, the higher-order functions 
generally compile down into for loops.

> * I do not fully understand the warning of the following script:
>
>    fn main() {
>      let bytes =
>        io::read_whole_file(&path::Path("/tmp/matching.msgpack")).get();
>    }
>
>    t2.rs:2:14: 2:78 warning: instantiating copy type parameter with a not
>    implicitly copyable type t2.rs:2   let bytes =
>    io::read_whole_file(&path::Path("/tmp/matching.msgpack")).get();
>    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>    Does it mean that it will copy the ~str again? When I use pattern
>    matching instead of get(), I don't get this warning, but it seems to
>    be slower. Will it just silence the warning???

Yes, it means it will copy the string again. To avoid this, you want 
result::unwrap() or option::unwrap() instead. I've been thinking for 
some time that .unwrap() should change to .get() and .get() should 
change to .copy_value() or something.

>
> * This is also strange to me:
>
>    fn nowarn(bytes: &[u8]) {}
>
>    fn main() {
>      let bytes = ~[1,2,3];
>      nowarn(bytes);
>      let br = io::BytesReader { bytes: bytes, pos: 0 }; // FAILS
>    }
>
>    t.rs:6:36: 6:41 error: mismatched types: expected `&/[u8]` but found
>    `~[u8]` ([] storage differs: expected & but found ~) t.rs:6   let br =
>    io::BytesReader { bytes: bytes, pos: 0 }; ^~~~~
>
>    It implicitly converts the ~pointer into a borrowed pointer when
>    calling the function, but the same does not work when using the
>    BytesReader struct. I think, I should use a make_bytes_reader
>    function, but I didn't found one.

This is a missing feature that should be in the language. Struct 
literals are basically just like functions; their fields should cause 
coercions as well.

> * String literals seem to be not immutable. Is that right. That means
>    they are always "heap" allocated. I wished they were immutable, so
>    that writing ~"my string" is stored in read-only memory.

~"my string" isn't designed to be stored in read-only memory. You want 
`&static/str` instead; since it's a borrowed pointer, it cannot be 
mutated. `static` is the read-only memory region.

>    Is there a way how a function which takes a ~str can state that it
>    will not modify the content?

Take an `&str` (or an `&~str`) instead. Functions that take `~str` 
require that the caller give up its ownership of the string. If you, the 
caller, give up a string, then you give up your say in how it is used, 
including mutability. However, if you as the callee *borrow* the string 
via `&str` or `&~str`, then you are not allowed to change its 
mutability, since you are not the owner.

>    In this regard I very much like the way the D language handles this.
>    It uses "const" to state that it won't modify the value, while the
>    value itself may be mutable. Then there is "immutable", and a value
>    declared as such will not change during the whole lifetime.

We have a "const" qualifier as well, which means what "const" does in D. 
It has a good chance of becoming redundant and going away, however, with 
the changes suggested in the blog post "Imagine Never Hearing the Words 
'Aliasable, Mutable' Again".

Having the ability to declare that a data type is forever immutable is 
something we've talked about a lot, but we'd have to add a lot of type 
system machinery for it to be as flexible as we'd like. Being able to 
arbitrarily freeze and thaw deep data structures is a very powerful 
feature, and having data types specify that they must be immutable 
forever is at odds with that. (For that matter, the `mut` keyword on 
struct fields is at odds with that in the other direction, which is why 
I'd like to get rid of that too.)

>    Of course in Rust, thanks to unique pointers, there is less need for
>    immutability, as you cannot share a unique pointer between threads.

You can share unique pointers between threads with an ARC data type, 
actually (in `std::arc`). The ARC demands that the pointer be immutable 
and will not allow it to be mutated.

> * Appending to strings. It's easy to push an element to an array by
>    doing:
>
>    let mut v: ~[int] = ~[1,2];
>    v.push(3);
>    v.push(4);
>
>    But when I want to append to a string, I have to write:
>
>    let mut s: ~str = ~"";
>    let mut s = str::append(s, "abc");
>    let mut s = str::append(s, "def");
>
>    I found this a bit counter-intuitive. I know there exists "+=", but
>    this will always create a new string. A "<<" operator would be really
>    nice to append to strings (or to arrays).

There should probably be a ".append()" method on strings with a "&mut 
self" argument. Then you could write:

     let mut s = ~"";
     s.append("abc");
     s.append("def");

There are also plans to make "+=" separately overloadable. This would 
allow += to work in this case, I believe.

> * Default initializers for structs. Would be nice to specify them like:
>
>    struct S {a: int = 4, b: int = 3};
>
>    I know I can use the ".." notation, and this is very cool and more
>    flexible, but I will have to type in a lot of code if the struct get
>    pretty large.
>
>    const DefaultS = S{a: 4, b: 3}; // imagine this has 100 fields :)
>    let s = S{a: 4, ..DefaultS};

Perhaps. This might be a good job for a macro at first, then we can see 
about folding it into the language if it's widely used.

> * Metaprogramming
>
>    Given an arbitrary struct S {...} with some fields, it would be nice
>    to somehow derive S.serialize and S.deserialize functions
>    automatically. Are there any ideas how to do that? In C++ I use the
>    preprocessor and templates for that. In D, thanks to
>    compile-time-code-evaluation, I can write code that will introspect
>    the struct during compile-time and then generate code.

There are #[auto_encode] and #[auto_decode] syntax extensions that exist 
already, actually (although the documentation is almost nonexistent). 
These are polymorphic over the actual serialization method, so you can 
choose the actual serialization format. There is also a visitor you can 
use for reflection, although it will be slower than generating the code 
at compile time.

We currently have syntax extensions written as compiler plugins. These 
allow you to write any code you want and have it executed at compile 
time. There are two main issues with them at the moment: (1) they have 
to be compiled as part of the compiler itself; (2) they expose too many 
internals of the `rustc` compiler, making your code likely to break when 
we change the compiler (or on alternative compilers implementing the 
Rust language, if they existed). The plan to fix (1) is to allow plugins 
to be written as separate crates and dynamically loaded; we've also 
talked about, longer-term, allowing them to be JIT'd, allowing you to 
execute any code you wish at compile time. The plan to fix (2) is to 
make the syntax extensions operate on token trees, not AST nodes, 
basically along the lines of Scheme syntax objects.

>    I guess I could write a macro like:
>
>    define_ser_struct!(S, field1, int, field2, uint, ...)
>
>    which would generate the struct S and two functions for
>    serialization. Would that be possible with macros?

Yes, you should be able do this with macros today, now that macros can 
expand to items.

Patrick



More information about the Rust-dev mailing list