[rust-dev] Integer overflow, round -2147483648

Daniel Micay danielmicay at gmail.com
Wed Jun 18 11:20:26 PDT 2014


On 18/06/14 01:08 PM, Gábor Lehel wrote:
> # Exposition
> 
> We've debated the subject of integer overflow quite a bit, without much
> apparent progress. Essentially, we've been running in circles around two
> core facts: wrapping is bad, but checking is slow. The current consensus
> seems to be to, albeit grudgingly, stick with the status quo.
> 
> I think we've established that a perfect, one-size-fits-all solution is
> not possible. But I don't think this means that we're out of options, or
> have no room for improvement. I think there are several imperfect,
> partial solutions we could pursue, to address the various use cases in a
> divide-and-conquer fashion.
> 
> This is not a formal RFC, more of a grab bag of thoughts and ideas.
> 
> The central consideration has to be the observation that, while wrapping
> around on overflow is well-supported by hardware, for the large majority
> of programs, it's the wrong behavior.
> 
> Basically, programs are just hoping that overflow won't happen. And if
> it ever does happen, it's going to result in unexpected behavior and
> bugs. (Including the possibility of security bugs: not all security bugs
> are memory safety bugs.) This is a step up from C's insanity of
> undefined behavior for signed overflow, where the compiler assumes that
> overflow *cannot* happen and even optimizes based on that assumption,
> but it's still a sad state of affairs. If we're clearing the bar, that's
> only because it's been buried underground.
> 
> We can divide programs into three categories. (I'm using "program" in
> the general sense of "piece of code which does a thing".)
> 
>  1) Programs where wrapping around on overflow is the desired semantics.
> 
>  2) Programs where wrapping around on overflow is not the desired
> semantics, but performance is not critical.

If performance wasn't critical, the program wouldn't be written in Rust.
The language isn't aimed at use cases where performance isn't a bug
deal, as it makes many sacrifices to provide the level of control that's
available.

>  3) Programs where wrapping around on overflow is not the desired
> semantics and performance is critical.
> 
> Programs in (1) are well-served by the language and libraries as they
> are, and there's not much to do except to avoid regressing.
> 
> Programs in (2) and (3) are not as well-served.
>
>
> # Checked math
> 
> For (2), the standard library offers checked math in the `CheckedAdd`,
> `CheckedMul` etc. traits, as well as integer types of unbounded size:
> `BigInt` and `BigUint`. This is good, but it's not enough. The acid test
> has to be whether for non-performance-critical code, people are actually
> *using* checked math. If they're not, then we've failed.
>
> `CheckedAdd` and co. are important to have for flexibility, but they're
> far too unwieldy for general use. People aren't going to write
> `.checked_add(2).unwrap()` when they can write `+ 2`. A more adequate
> design might be something like this:
> 
>  * Have traits for all the arithmetic operations for both checking on
> overflow and for wrapping around on overflow, e.g. `CheckedAdd` (as
> now), `WrappingAdd`, `CheckedSub`, `WrappingSub`, and so on.
> 
>  * Offer convenience methods for the Checked traits which perform
> `unwrap()` automatically.
> 
>  * Have separate sets of integer types which check for overflow and
> which wrap around on overflow. Whatever they're called: `CheckedU8`,
> `checked::u8`, `uc8`, ...
> 
>  * Both sets of types implement all of the Checked* and Wrapping*
> traits. You can use explicit calls to get either behavior with either types.
> 
>  * The checked types use the Checked traits to implement the operator
> overloads (`Add`, Mul`, etc.), while the wrapping types use the Wrapping
> traits to implement them. In other words, the difference between the
> types is (probably only) in the behavior of the operators.
> 
>  * `BigInt` also implements all of the Wrapping and Checked traits:
> because overflow never happens, it can claim to do anything if it "does
> happen". `BigUint` implements all of them except for the Wrapping traits
> which may underflow, such as `WrappingSub`, because it has nowhere to
> wrap around to.
> 
> Another option would be to have just one set of types but two sets of
> operators, like Swift does. I think that would work as well, or even
> better, but we've been avoiding having any operators which aren't
> familiar from C.
> 
> 
> #  Unbounded integers
> 
> While checked math helps catch instances of overflow and prevent
> misbehaviors and bugs, many programs would prefer integer types which do
> the right thing and don't overflow in the first place. For this, again,
> we currently have `BigInt` and `BigUint`. There's one problem with
> these: because they may allocate, they no longer `Copy`, which means
> that they can't be just drop-in replacements for the fixed-size types.
>
>
> To partially address this, once we have tracing GC, and if we manage to
> make `Gc<T>: Copy`, we should add unbounded `Integer` (as in Haskell)
> and `Natural` types which internally use `Gc`, and so are also `Copy`.
> (In exchange, they wouldn't be `Send`, but that's far less pervasive.)
> These would (again, asides from `Send`) look and feel just like the
> built-in fixed-size types, while having the semantics of actual
> mathematical integers, resp. naturals (up to resource exhaustion of
> course). They would be ideal for code which is not performance critical
> and doesn't mind incurring, or already uses, garbage collection. For
> those cases, you wouldn't have to think about the tradeoffs, or make
> difficult choices: `Integer` is what you use.

A tracing garbage collector for Rust is a possibility but not a
certainty. I don't think it would make sense to have `Gc<T>` support
`Copy` but have it left out for `Rc<T>`. The fact that an arbitrary
compiler decision like that would determine the most convenient type is
a great reason to avoid making that arbitrary choice.

There's no opportunity for cycles in integers, and `Rc<T>` will be
faster along with using far less memory. It doesn't have the overhead
associated with reference counting in other languages due to being
task-local (not atomic) and much of the reference counting is elided by
move semantics / borrows.

With the help of sized deallocation, Rust can have an incredibly fast
allocator implementation. Since `Rc<T>` is task-local, it also doesn't
need to be using the same allocator entry point as sendable types. It
can make use of a thread-local allocator with less complexity and
overhead, although this could also be available on an opt-in basis for
sendable types by changing the allocator parameter from the default.

> One concern with this would be the possibility of programs incurring GC
> accidentally by using these types. There's several ways to deal with this:
> 
>  * Note the fact that they use GC prominently in the documentation.
> 
>  * Make sure the No-GC lint catches any use of them.
> 
>  * Offer a "no GC in this task" mode which fails the task if GC
> allocation is invoked, to catch mistakes at runtime.
> 
> I think these would be more than adequate to address the concern.

I don't think encouraging tracing garbage collection is appropriate for
a language designed around avoiding it. It would be fine to have it as a
choice if it never gets in the way, but it shouldn't be promoted as a
default.

> # Between a rock and a hard place
> 
> Having dispatched the "easy" cases above, for category #3 we're left
> between the rock (wraparound on overflow is wrong) and the hard place
> (checking for overflow is slow).
> 
> Even here, we may have options.
> 
> An observation:
> 
>  * We are adamantly opposed to compiler switches to turn off array
> bounds checking, because we are unwilling to compromise memory safety.
> 
>  * We are relatively unbothered by unchecked arithmetic, because it
> *doesn't* compromise memory safety.
> 
> Taking these together, I think we should at least be less than adamantly
> opposed to compiler switches for enabling or disabling checked arithmetic.

I think compiler switches or attributes enabling a different dialect of
the language are a bad idea as a whole. Code from libraries is directly
mixed into other crates, so changing the semantics of the language is
inherently broken.

> Consider the status quo. People use integer types which wrap on
> overflow. If they ever do overflow, it means misbehaviors and bugs. If
> we had a compiler flag to turn on checked arithmetic, even if it were
> only used a few times in testing, it would be a strict improvement: more
> bugs would be caught with less effort.
>
> But we clearly can't just add this flag for existing types, because
> they're well-defined to wrap around on overflow, and some programs
> (category #1) rely on this. So we need to have separate types.
> 
> One option is therefore to just define this set of types as failing the
> task on overflow if checked arithmetic is enabled, and to wrap around if
> it's disabled. But it doesn't necessarily make sense to specify
> wraparound in the latter case, given that programs are not supposed to
> depend on it: they may be compiled with either flag, and should avoid
> overflow.
> 
> Another observation:
> 
>  * Undefined behavior is anathema to the safe subset of the language.
> That would mean that it's not safe.
> 
>  * But *unspecified results* are maybe not so bad. We might already have
> them for bit-shifts. (Question for the audience: do we?)
> 
> If unspecified results are acceptable, then we could instead say that
> these types fail on overflow if checked arithmetic is enabled, and have
> unspecified results if it isn't. But saying they wrap around is fine as
> well.
> 
> This way, we can put off the choice between the rock and the hard place
> from program-writing time to compile time, at least.
> 
> 
> # Defaults
> 
> Even if we provide the various options from above, from the perspective
> of what types people end up using, defaults are very important.
> 
> There's two kinds of defaults:
> 
> * The de jure default, inferred by the type system in the absence of
> other information, which used to be `int`. Thankfully, we're removing this.
> 
> * The de facto, cultural default. For instance, if there is a type
> called "int", most people will use it without thinking.
> 
> The latter question is still something we need to think about. Should we
> have a clear cultural default? Or should we force people to explicitly
> choose between checked and wrapping arithmetic?
> 
> For the most part, this boils down to:
> 
>  * If `int` is checked, the default is slow
>  
>  * If `int` wraps, the default is wrong
> 
>  * If there is no `int`, people are confused
> 
> Regarding the first, we seem to be living in deathly fear of someone
> naively writing an arithmetic benchmark in Rust, putting it up on the
> internet, and saying, "Look: Rust is slow". This is not an unrealistic
> scenario, sadly. The question is whether we'd rather have more programs
> be accidentally incorrect in order to avoid bad publicity from
> benchmarks being accidentally slow.
> 
> Regarding the third, even if the only types we have are `intc`, `intw`,
> `ic8`, `iw8`, and so on, we *still* can't entirely escape creating a
> cultural default, because we still need to choose types for functions in
> the standard library and for built-in operations like array indexing.
> Then the path of least resistance for any given program will be to use
> the same types.
> 
> There's one thing that might help resolve this conundrum, which is if we
> consider the previously-described scheme with compiler flags to control
> checked arithmetic to be acceptable. In that case, I think those types
> would be the clear choice to be the de facto defaults. Then we would have:
> 
>  * `i8`, `u8` .. `i64`, `u64`, `int`, and `uint` types, which fail the
> task on overflow if checked arithmetic is turned on, and either wrap
> around or have an unspecified result if it's off
>
>  * a corresponding set of types somewhere in the standard library, which
> wrap around no matter what
> 
>  * and another set of corresponding types, which are checked no matter what
> 
> 
> -Gábor

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20140618/93354284/attachment.sig>


More information about the Rust-dev mailing list