[rust-dev] Misc questions and ideas

Michael Neumann mneumann at ntecs.de
Sun Dec 23 13:25:22 PST 2012


Am Sun, 23 Dec 2012 12:20:07 -0500
schrieb Patrick Walton <pwalton at mozilla.com>:

> On 12/23/12 10:43 AM, Michael Neumann wrote:
> > Hi,
> >
> > I've spent the last days hacking in Rust and a few questions and
> > ideas have accumulated over that time.
> >
> > * If I use unique ~pointers, there is absolutely no runtime
> > overhead, so neither ref-counting nor GC is involved, right?
> 
> Well, you have to malloc and free, but I assume you aren't counting 
> that. There is no reference counting or GC, and the GC is totally 
> unaware of such pointers.
> 
> (There is one caveat: when we have tracing GC, it must scan ~
> pointers that contain @ pointers, just as it must scan the stack.
> Such pointers are generally uncommon though.)

What is the big advantage of having a tracing GC over ref counting?
With GC we'd get rid of the extra indirection and extra operations
during aliasing, so it's basically a performance issue, right?

> > * Heap-allocated pointers incur ref-counting. So when I pass a
> >    @pointer, I will basically pass a
> >
> >      struct heap_ptr {ptr: *byte, cnt: uint}
> >
> >    around. Right?
> 
> They currently do thread-unsafe reference counting, but we would like
> to eventually change that to tracing GC. However, the structure is 
> different: we use intrusive reference counting, so it's actually a 
> pointer to this structure:
> 
> pointer --> [ ref count, type_info, next alloc, prev alloc, data... ]

Oh, I see, there is actually no double indirection, as [pointer+x]
always points to the data. Neat!

> You're only passing one word around, not two. The reference count is 
> inside the object pointed to. This setup saves one allocation over
> C++ std::shared_ptr.
> 
> > * vec::build_sized() somehow seems to be pretty slow. When I use it,
> >    instead of a for() loop, my rust-msgpack library slows down by
> >    factor 2 for loading msgpack data.
> >
> >    Also, I would have expected that vec::build_sized() will call my
> >    supplied function "n" times. IMHO the name is little bit
> >    misleading here.
> 
> You want vec::from_fn() instead. vec::build_sized() is not commonly
> used and could probably be renamed without too much trouble.

Actually I was thinking of sth like in Ruby:

  Array.new(size=10) {|i| i % 2}

gives:

  [0, 1, 0, 1, 0, 1, 0, 1...]

  fn make_sized<T>(n: uint, f: fn(uint) ->  T) ->  ~[T] {
    let mut v: ~[T] = vec::with_capacity(n);
    let mut i: uint = 0;
    while (i < n) {
      v.push(f(i));
      i += 1;
    }
    v
  }

  do vec::make_sized(10) |i| {i % 2}

> I suspect the performance problem you're seeing with it is due to not 
> supplying enough LLVM inline hints. LLVM's inline heuristics are not 
> well tuned to Rust at the moment; we work around it by writing 
> #[inline(always)] in a lot of places, but we should probably have the 
> compiler insert those automatically for certain uses of higher-order 
> functions. When LLVM inlines properly, the higher-order functions 
> generally compile down into for loops.

Is this an issue the LLVM developers are working on?

> > * I do not fully understand the warning of the following script:
> >
> >    fn main() {
> >      let bytes =
> >        io::read_whole_file(&path::Path("/tmp/matching.msgpack")).get();
> >    }
> >
> >    t2.rs:2:14: 2:78 warning: instantiating copy type parameter with
> > a not implicitly copyable type t2.rs:2   let bytes =
> >    io::read_whole_file(&path::Path("/tmp/matching.msgpack")).get();
> >    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> >    Does it mean that it will copy the ~str again? When I use pattern
> >    matching instead of get(), I don't get this warning, but it
> > seems to be slower. Will it just silence the warning???
> 
> Yes, it means it will copy the string again. To avoid this, you want 
> result::unwrap() or option::unwrap() instead. I've been thinking for 
> some time that .unwrap() should change to .get() and .get() should 
> change to .copy_value() or something.

Yes, I think sth with copy in the name would be less surprising. Ok,
unwrap makes sense. Or maybe get() and get_copy()?

> >
> > * This is also strange to me:
> >
> >    fn nowarn(bytes: &[u8]) {}
> >
> >    fn main() {
> >      let bytes = ~[1,2,3];
> >      nowarn(bytes);
> >      let br = io::BytesReader { bytes: bytes, pos: 0 }; // FAILS
> >    }
> >
> >    t.rs:6:36: 6:41 error: mismatched types: expected `&/[u8]` but
> > found `~[u8]` ([] storage differs: expected & but found ~) t.rs:6
> > let br = io::BytesReader { bytes: bytes, pos: 0 }; ^~~~~
> >
> >    It implicitly converts the ~pointer into a borrowed pointer when
> >    calling the function, but the same does not work when using the
> >    BytesReader struct. I think, I should use a make_bytes_reader
> >    function, but I didn't found one.
> 
> This is a missing feature that should be in the language. Struct 
> literals are basically just like functions; their fields should cause 
> coercions as well.

Ok.
 
> > * String literals seem to be not immutable. Is that right. That
> > means they are always "heap" allocated. I wished they were
> > immutable, so that writing ~"my string" is stored in read-only
> > memory.
> 
> ~"my string" isn't designed to be stored in read-only memory. You
> want `&static/str` instead; since it's a borrowed pointer, it cannot
> be mutated. `static` is the read-only memory region.

I understand. Makes sense.
 
> >    Is there a way how a function which takes a ~str can state that
> > it will not modify the content?
> 
> Take an `&str` (or an `&~str`) instead. Functions that take `~str` 
> require that the caller give up its ownership of the string. If you,
> the caller, give up a string, then you give up your say in how it is
> used, including mutability. However, if you as the callee *borrow*
> the string via `&str` or `&~str`, then you are not allowed to change
> its mutability, since you are not the owner.

So a "const" function (in terms of C++ ;-) would always take a &str
pointer? Makes absolute sense to me.

> >    In this regard I very much like the way the D language handles
> > this. It uses "const" to state that it won't modify the value,
> > while the value itself may be mutable. Then there is "immutable",
> > and a value declared as such will not change during the whole
> > lifetime.
> 
> We have a "const" qualifier as well, which means what "const" does in
> D. It has a good chance of becoming redundant and going away,
> however, with the changes suggested in the blog post "Imagine Never
> Hearing the Words 'Aliasable, Mutable' Again".
> 
> Having the ability to declare that a data type is forever immutable
> is something we've talked about a lot, but we'd have to add a lot of
> type system machinery for it to be as flexible as we'd like. Being
> able to arbitrarily freeze and thaw deep data structures is a very
> powerful feature, and having data types specify that they must be
> immutable forever is at odds with that. (For that matter, the `mut`
> keyword on struct fields is at odds with that in the other direction,
> which is why I'd like to get rid of that too.)
> 
> >    Of course in Rust, thanks to unique pointers, there is less need
> > for immutability, as you cannot share a unique pointer between
> > threads.
> 
> You can share unique pointers between threads with an ARC data type, 
> actually (in `std::arc`). The ARC demands that the pointer be
> immutable and will not allow it to be mutated.
> 
> > * Appending to strings. It's easy to push an element to an array by
> >    doing:
> >
> >    let mut v: ~[int] = ~[1,2];
> >    v.push(3);
> >    v.push(4);
> >
> >    But when I want to append to a string, I have to write:
> >
> >    let mut s: ~str = ~"";
> >    let mut s = str::append(s, "abc");
> >    let mut s = str::append(s, "def");
> >
> >    I found this a bit counter-intuitive. I know there exists "+=",
> > but this will always create a new string. A "<<" operator would be
> > really nice to append to strings (or to arrays).
> 
> There should probably be a ".append()" method on strings with a "&mut 
> self" argument. Then you could write:
> 
>      let mut s = ~"";
>      s.append("abc");
>      s.append("def");
> 
> There are also plans to make "+=" separately overloadable. This would 
> allow += to work in this case, I believe.

Ideally there would be an operator, as writing .append() all the time is
quite tedious.

> > * Default initializers for structs. Would be nice to specify them
> > like:
> >
> >    struct S {a: int = 4, b: int = 3};
> >
> >    I know I can use the ".." notation, and this is very cool and
> > more flexible, but I will have to type in a lot of code if the
> > struct get pretty large.
> >
> >    const DefaultS = S{a: 4, b: 3}; // imagine this has 100 fields :)
> >    let s = S{a: 4, ..DefaultS};
> 
> Perhaps. This might be a good job for a macro at first, then we can
> see about folding it into the language if it's widely used.
> 
> > * Metaprogramming
> >
> >    Given an arbitrary struct S {...} with some fields, it would be
> > nice to somehow derive S.serialize and S.deserialize functions
> >    automatically. Are there any ideas how to do that? In C++ I use
> > the preprocessor and templates for that. In D, thanks to
> >    compile-time-code-evaluation, I can write code that will
> > introspect the struct during compile-time and then generate code.
> 
> There are #[auto_encode] and #[auto_decode] syntax extensions that
> exist already, actually (although the documentation is almost
> nonexistent). These are polymorphic over the actual serialization
> method, so you can choose the actual serialization format. There is
> also a visitor you can use for reflection, although it will be slower
> than generating the code at compile time.

Hm, this is interesting. Is there somewhere a simple example how to use
#[auto_encode] and what my msgpack library needs to implement to work
with it?

> We currently have syntax extensions written as compiler plugins.
> These allow you to write any code you want and have it executed at
> compile time. There are two main issues with them at the moment: (1)
> they have to be compiled as part of the compiler itself; (2) they
> expose too many internals of the `rustc` compiler, making your code
> likely to break when we change the compiler (or on alternative
> compilers implementing the Rust language, if they existed). The plan
> to fix (1) is to allow plugins to be written as separate crates and
> dynamically loaded; we've also talked about, longer-term, allowing
> them to be JIT'd, allowing you to execute any code you wish at
> compile time. The plan to fix (2) is to make the syntax extensions
> operate on token trees, not AST nodes, basically along the lines of
> Scheme syntax objects.

I see. So it would be possible to write a syntax extension called for
example iter_fields!(struct_Type) which could be used to generate i.e.
custom serializers. But that would be probably similar to #auto_encode,
just that it could be more user-defined.

> 
> >    I guess I could write a macro like:
> >
> >    define_ser_struct!(S, field1, int, field2, uint, ...)
> >
> >    which would generate the struct S and two functions for
> >    serialization. Would that be possible with macros?
> 
> Yes, you should be able do this with macros today, now that macros
> can expand to items.

Great. I will try that as an example to learn more about macros.

Thanks!

Best,

  Michael


More information about the Rust-dev mailing list