[rust-dev] Integer overflow, round -2147483648

Gábor Lehel glaebhoerl at gmail.com
Fri Jun 20 13:51:49 PDT 2014


On Wed, Jun 18, 2014 at 8:20 PM, Daniel Micay <danielmicay at gmail.com> wrote:

> On 18/06/14 01:08 PM, Gábor Lehel wrote:
> > # Exposition
> >
> > We've debated the subject of integer overflow quite a bit, without much
> > apparent progress. Essentially, we've been running in circles around two
> > core facts: wrapping is bad, but checking is slow. The current consensus
> > seems to be to, albeit grudgingly, stick with the status quo.
> >
> > I think we've established that a perfect, one-size-fits-all solution is
> > not possible. But I don't think this means that we're out of options, or
> > have no room for improvement. I think there are several imperfect,
> > partial solutions we could pursue, to address the various use cases in a
> > divide-and-conquer fashion.
> >
> > This is not a formal RFC, more of a grab bag of thoughts and ideas.
> >
> > The central consideration has to be the observation that, while wrapping
> > around on overflow is well-supported by hardware, for the large majority
> > of programs, it's the wrong behavior.
> >
> > Basically, programs are just hoping that overflow won't happen. And if
> > it ever does happen, it's going to result in unexpected behavior and
> > bugs. (Including the possibility of security bugs: not all security bugs
> > are memory safety bugs.) This is a step up from C's insanity of
> > undefined behavior for signed overflow, where the compiler assumes that
> > overflow *cannot* happen and even optimizes based on that assumption,
> > but it's still a sad state of affairs. If we're clearing the bar, that's
> > only because it's been buried underground.
> >
> > We can divide programs into three categories. (I'm using "program" in
> > the general sense of "piece of code which does a thing".)
> >
> >  1) Programs where wrapping around on overflow is the desired semantics.
> >
> >  2) Programs where wrapping around on overflow is not the desired
> > semantics, but performance is not critical.
>
> If performance wasn't critical, the program wouldn't be written in Rust.
> The language isn't aimed at use cases where performance isn't a bug
> deal, as it makes many sacrifices to provide the level of control that's
> available.
>

People write GUI frameworks and applications in C++ and even in C. Just
because a language is appropriate for low-level and performance-critical
code doesn't mean it needs to be inappropriate for anything else. I think
Rust is far superior as a general-purpose language to most of today's
mainstream languages.

And even in applications where some parts are performance-critical, many
parts may not be. I expect the ratios may be tilted differently for Rust
code, but not the fundamental pattern to be different.


>
> >  3) Programs where wrapping around on overflow is not the desired
> > semantics and performance is critical.
> >
> > Programs in (1) are well-served by the language and libraries as they
> > are, and there's not much to do except to avoid regressing.
> >
> > Programs in (2) and (3) are not as well-served.
> >
> >
> > # Checked math
> >
> > For (2), the standard library offers checked math in the `CheckedAdd`,
> > `CheckedMul` etc. traits, as well as integer types of unbounded size:
> > `BigInt` and `BigUint`. This is good, but it's not enough. The acid test
> > has to be whether for non-performance-critical code, people are actually
> > *using* checked math. If they're not, then we've failed.
> >
> > `CheckedAdd` and co. are important to have for flexibility, but they're
> > far too unwieldy for general use. People aren't going to write
> > `.checked_add(2).unwrap()` when they can write `+ 2`. A more adequate
> > design might be something like this:
> >
> >  * Have traits for all the arithmetic operations for both checking on
> > overflow and for wrapping around on overflow, e.g. `CheckedAdd` (as
> > now), `WrappingAdd`, `CheckedSub`, `WrappingSub`, and so on.
> >
> >  * Offer convenience methods for the Checked traits which perform
> > `unwrap()` automatically.
> >
> >  * Have separate sets of integer types which check for overflow and
> > which wrap around on overflow. Whatever they're called: `CheckedU8`,
> > `checked::u8`, `uc8`, ...
> >
> >  * Both sets of types implement all of the Checked* and Wrapping*
> > traits. You can use explicit calls to get either behavior with either
> types.
> >
> >  * The checked types use the Checked traits to implement the operator
> > overloads (`Add`, Mul`, etc.), while the wrapping types use the Wrapping
> > traits to implement them. In other words, the difference between the
> > types is (probably only) in the behavior of the operators.
> >
> >  * `BigInt` also implements all of the Wrapping and Checked traits:
> > because overflow never happens, it can claim to do anything if it "does
> > happen". `BigUint` implements all of them except for the Wrapping traits
> > which may underflow, such as `WrappingSub`, because it has nowhere to
> > wrap around to.
> >
> > Another option would be to have just one set of types but two sets of
> > operators, like Swift does. I think that would work as well, or even
> > better, but we've been avoiding having any operators which aren't
> > familiar from C.
> >
> >
> > #  Unbounded integers
> >
> > While checked math helps catch instances of overflow and prevent
> > misbehaviors and bugs, many programs would prefer integer types which do
> > the right thing and don't overflow in the first place. For this, again,
> > we currently have `BigInt` and `BigUint`. There's one problem with
> > these: because they may allocate, they no longer `Copy`, which means
> > that they can't be just drop-in replacements for the fixed-size types.
> >
> >
> > To partially address this, once we have tracing GC, and if we manage to
> > make `Gc<T>: Copy`, we should add unbounded `Integer` (as in Haskell)
> > and `Natural` types which internally use `Gc`, and so are also `Copy`.
> > (In exchange, they wouldn't be `Send`, but that's far less pervasive.)
> > These would (again, asides from `Send`) look and feel just like the
> > built-in fixed-size types, while having the semantics of actual
> > mathematical integers, resp. naturals (up to resource exhaustion of
> > course). They would be ideal for code which is not performance critical
> > and doesn't mind incurring, or already uses, garbage collection. For
> > those cases, you wouldn't have to think about the tradeoffs, or make
> > difficult choices: `Integer` is what you use.
>
> A tracing garbage collector for Rust is a possibility but not a
> certainty. I don't think it would make sense to have `Gc<T>` support
> `Copy` but have it left out for `Rc<T>`. The fact that an arbitrary
> compiler decision like that would determine the most convenient type is
> a great reason to avoid making that arbitrary choice.
>
> There's no opportunity for cycles in integers, and `Rc<T>` will be
> faster along with using far less memory. It doesn't have the overhead
> associated with reference counting in other languages due to being
> task-local (not atomic) and much of the reference counting is elided by
> move semantics / borrows.
>
> With the help of sized deallocation, Rust can have an incredibly fast
> allocator implementation. Since `Rc<T>` is task-local, it also doesn't
> need to be using the same allocator entry point as sendable types. It
> can make use of a thread-local allocator with less complexity and
> overhead, although this could also be available on an opt-in basis for
> sendable types by changing the allocator parameter from the default.
>
> > One concern with this would be the possibility of programs incurring GC
> > accidentally by using these types. There's several ways to deal with
> this:
> >
> >  * Note the fact that they use GC prominently in the documentation.
> >
> >  * Make sure the No-GC lint catches any use of them.
> >
> >  * Offer a "no GC in this task" mode which fails the task if GC
> > allocation is invoked, to catch mistakes at runtime.
> >
> > I think these would be more than adequate to address the concern.
>
> I don't think encouraging tracing garbage collection is appropriate for
> a language designed around avoiding it. It would be fine to have it as a
> choice if it never gets in the way, but it shouldn't be promoted as a
> default.
>

The idea is to pick the low-hanging fruit. For programs that use garbage
collection, we can offer an integer type that requires neither ergonomic
nor semantic compromises. So let's.


>
> > # Between a rock and a hard place
> >
> > Having dispatched the "easy" cases above, for category #3 we're left
> > between the rock (wraparound on overflow is wrong) and the hard place
> > (checking for overflow is slow).
> >
> > Even here, we may have options.
> >
> > An observation:
> >
> >  * We are adamantly opposed to compiler switches to turn off array
> > bounds checking, because we are unwilling to compromise memory safety.
> >
> >  * We are relatively unbothered by unchecked arithmetic, because it
> > *doesn't* compromise memory safety.
> >
> > Taking these together, I think we should at least be less than adamantly
> > opposed to compiler switches for enabling or disabling checked
> arithmetic.
>
> I think compiler switches or attributes enabling a different dialect of
> the language are a bad idea as a whole. Code from libraries is directly
> mixed into other crates, so changing the semantics of the language is
> inherently broken.
>

Even if checked arithmetic is only turned on for testing/debugging and not
in production code, it's still a strict improvement over the status quo.
Under the status quo, except where wraparound is the intended semantics,
overflow is silently wrong 100% of the time. With the alternative, that
percentage is smaller.


>
> > Consider the status quo. People use integer types which wrap on
> > overflow. If they ever do overflow, it means misbehaviors and bugs. If
> > we had a compiler flag to turn on checked arithmetic, even if it were
> > only used a few times in testing, it would be a strict improvement: more
> > bugs would be caught with less effort.
> >
> > But we clearly can't just add this flag for existing types, because
> > they're well-defined to wrap around on overflow, and some programs
> > (category #1) rely on this. So we need to have separate types.
> >
> > One option is therefore to just define this set of types as failing the
> > task on overflow if checked arithmetic is enabled, and to wrap around if
> > it's disabled. But it doesn't necessarily make sense to specify
> > wraparound in the latter case, given that programs are not supposed to
> > depend on it: they may be compiled with either flag, and should avoid
> > overflow.
> >
> > Another observation:
> >
> >  * Undefined behavior is anathema to the safe subset of the language.
> > That would mean that it's not safe.
> >
> >  * But *unspecified results* are maybe not so bad. We might already have
> > them for bit-shifts. (Question for the audience: do we?)
> >
> > If unspecified results are acceptable, then we could instead say that
> > these types fail on overflow if checked arithmetic is enabled, and have
> > unspecified results if it isn't. But saying they wrap around is fine as
> > well.
> >
> > This way, we can put off the choice between the rock and the hard place
> > from program-writing time to compile time, at least.
> >
> >
> > # Defaults
> >
> > Even if we provide the various options from above, from the perspective
> > of what types people end up using, defaults are very important.
> >
> > There's two kinds of defaults:
> >
> > * The de jure default, inferred by the type system in the absence of
> > other information, which used to be `int`. Thankfully, we're removing
> this.
> >
> > * The de facto, cultural default. For instance, if there is a type
> > called "int", most people will use it without thinking.
> >
> > The latter question is still something we need to think about. Should we
> > have a clear cultural default? Or should we force people to explicitly
> > choose between checked and wrapping arithmetic?
> >
> > For the most part, this boils down to:
> >
> >  * If `int` is checked, the default is slow
> >
> >  * If `int` wraps, the default is wrong
> >
> >  * If there is no `int`, people are confused
> >
> > Regarding the first, we seem to be living in deathly fear of someone
> > naively writing an arithmetic benchmark in Rust, putting it up on the
> > internet, and saying, "Look: Rust is slow". This is not an unrealistic
> > scenario, sadly. The question is whether we'd rather have more programs
> > be accidentally incorrect in order to avoid bad publicity from
> > benchmarks being accidentally slow.
> >
> > Regarding the third, even if the only types we have are `intc`, `intw`,
> > `ic8`, `iw8`, and so on, we *still* can't entirely escape creating a
> > cultural default, because we still need to choose types for functions in
> > the standard library and for built-in operations like array indexing.
> > Then the path of least resistance for any given program will be to use
> > the same types.
> >
> > There's one thing that might help resolve this conundrum, which is if we
> > consider the previously-described scheme with compiler flags to control
> > checked arithmetic to be acceptable. In that case, I think those types
> > would be the clear choice to be the de facto defaults. Then we would
> have:
> >
> >  * `i8`, `u8` .. `i64`, `u64`, `int`, and `uint` types, which fail the
> > task on overflow if checked arithmetic is turned on, and either wrap
> > around or have an unspecified result if it's off
> >
> >  * a corresponding set of types somewhere in the standard library, which
> > wrap around no matter what
> >
> >  * and another set of corresponding types, which are checked no matter
> what
> >
> >
> > -Gábor
>
>
> _______________________________________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20140620/56baaca3/attachment.html>


More information about the Rust-dev mailing list