[rust-dev] On Copy = POD

Paulo Sérgio Almeida pssalmeida at gmail.com
Fri Jun 20 12:07:49 PDT 2014


Hi all,

Currently being Copy equates with being Pod. The more time passes and the
more code examples I see, it is amazing the amount of ugliness that it
causes. I wonder if there is a way out.

There are two aspects regarding Copy: the semantic side and the
implementation side. On the more abstract semantic side, copy is the
ability to remain using a variable after assignment by value (from the
current version of the manual: "Types that do not move ownership when used
by-value"). If x:X is copy then making y = x, or f(x) allows keeping using
x afterwards.

On the  implementation side, Copy is defined as POD (it was even named Pod
previously), so that copies can be made by simple memcpy and no surprising
and possibly expensive operation is performed, which would occur if
arbitrary user-defined copy constructors were allowed.

Regarding the semantic level, Copyness is too much an important part of the
interface of a type to be left to be inferred from the type being POD, and
change more or less randomly and break client-code, e.g., if some field is
added. This concern is already addressed by Niko's proposal of
opt-in-builtin traits (
https://github.com/rust-lang/rfcs/blob/master/active/0003-opt-in-builtin-traits.md).
I hope it gets adopted for 1.0.

Regarding the implementation side, Copy is currently restricted to being
POD, being memcpy copied, a simple rule, but which forbids some cases
fitting the spirit of "type is small and cheap to copy", and which would
benefit from being Copy, leading to ugliness and lack of uniformity, namely
regarding smart-pointer types. Maybe the best example is the Rc and Gc
types. From the semantic point of view, these aim to share ownership of
immutable values, and both should offer the same interface and both be
Copy, to make usage more transparent, and avoid excessive cloning. But
while Gc is Copy, Rc cannot be, even though it is "small and cheap to copy"
even if not by a memcpy. This low-level definition of Copy = POD is
resulting in code with smart-pointers that has a number of clones which are
actually misleading, because they are not cloning the referent but the
pointer, which is something that will sound artificial, to say the least,
for people coming to rust.

I have been thinking if there would be any way to have a more encompassing
Copy, allowing, e.g. Rc to be copy, fitting the spirit "of small and cheap
to copy", while forbidding general user-defined copy constructors. If Rc
needs a copy-constructor to update the reference count (the current clone),
and Rc is implemented as a normal library type, in Rust, allowing it would
mean allowing general user defined copy-constructors, which is ruled out.
So, is there no way out?

Imagine that all the essential pointer types for sharing ownership: Rc, Gc,
and even Arc, were all built-in. We could decide to say they were Copy, to
infer PODness for types, as now, use memcpy for those that are POD, as now,
use the built-in copy-constructors for types Copy but not POD, and allow
deriving Copy for user defined-types only if they are POD, ruling out
user-defined copy-constructors.

But Rc, Gc, Arc are implemented in Rust. Does this mean that to prevent
user-defined copy-constructors we must give up all hope of having these
essential pointer-types Copy? I.e. must orthogonality rule at all costs? I
wonder whether it would be possible to keep the essential spirit of
Copyness, while allowing special cases for a small number of "blessed"
library types, something like:

"Implicit copy under assignment or by value parameter passing cannot be
arbitrarily user-defined, to rule out expensive implicit copies; only POD
user-defined types can derive Copy. However, each version of the language
will define a small approved list of types (essentially the pointer-types
for shared ownership), for which the cost of copy has been deemed small,
and which are defined as Copy."

Even considering Arc, where copy (the current clone()) would be more
expensive, having auto-borrowing (which should be made uniform for all
pointer types) means that functions which take a reference to the referent
won't involve copying the Arc itself, which together with a last-use move
optimisation will make programs have basically the same run-time cost as
now, where the implicit copies will happen where we now have an explicit
clone(), while making them more elegant. E.g., instead of writing:

fn main() {
    let numbers = Vec::from_fn(100, |i| i as f32);
    let shared_numbers = Arc::new(numbers);

    for _ in range(0, 10) {
        let child_numbers = shared_numbers.clone();

        spawn(proc() {
            let local_numbers = child_numbers.as_slice();

            // Work with the local numbers
        });
    }
}

which may be misleading to people coming to Rust, as the numbers are not
being cloned, and there are not several Vecs around (parent ones, child
ones), but a single Vec, what I would like to write is:

fn main() {
    let numbers = Vec::from_fn(100, |i| i as f32);
    let shared_numbers = Arc::new(numbers);

    for _ in range(0, 10) {
        spawn(proc() {
            let slice = shared_numbers.as_slice();

            // Work with the numbers
        });
    }
}

Which would "just work", while being more clear, as the shared_numbers Arc,
being Copy, would be copied to the proc, as happens now for POD types. I
have seen many other examples, where the code could mislead the reader into
thinking there are several, e.g., Mutexes:

let mutex = Arc::new(Mutex::new(1));
let mutex2 = mutex.clone();

or several Barriers, or other similar cases. What will happen when we need
say, two real different Mutexes, or Vecs, to be shared by 2 tasks? We will
have 4 different variables, more trouble choosing names, and greater
cognitive burden in seeing which one refers to which. E.g., does mutex_1_2
mean the first mutex to be used in task 2 or vice-versa?

I dream of not having these ugly things in Rust. The advanced Rust type
system, namely having borrowing, allows avoiding what would be otherwise
many copies, when we only need to pass a reference to be used in some
function, while having implicit copies mostly when we now do many
misleading explicit clones. As of now we are not exploiting it as much as
we could to get more beautiful programs with the same performance.

Regards,
Paulo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20140620/085d9574/attachment.html>


More information about the Rust-dev mailing list