[rust-dev] Integer overflow, round -2147483648

Gábor Lehel glaebhoerl at gmail.com
Wed Jun 18 10:08:21 PDT 2014


# Exposition

We've debated the subject of integer overflow quite a bit, without much
apparent progress. Essentially, we've been running in circles around two
core facts: wrapping is bad, but checking is slow. The current consensus
seems to be to, albeit grudgingly, stick with the status quo.

I think we've established that a perfect, one-size-fits-all solution is not
possible. But I don't think this means that we're out of options, or have
no room for improvement. I think there are several imperfect, partial
solutions we could pursue, to address the various use cases in a
divide-and-conquer fashion.

This is not a formal RFC, more of a grab bag of thoughts and ideas.

The central consideration has to be the observation that, while wrapping
around on overflow is well-supported by hardware, for the large majority of
programs, it's the wrong behavior.

Basically, programs are just hoping that overflow won't happen. And if it
ever does happen, it's going to result in unexpected behavior and bugs.
(Including the possibility of security bugs: not all security bugs are
memory safety bugs.) This is a step up from C's insanity of undefined
behavior for signed overflow, where the compiler assumes that overflow
*cannot* happen and even optimizes based on that assumption, but it's still
a sad state of affairs. If we're clearing the bar, that's only because it's
been buried underground.

We can divide programs into three categories. (I'm using "program" in the
general sense of "piece of code which does a thing".)

 1) Programs where wrapping around on overflow is the desired semantics.

 2) Programs where wrapping around on overflow is not the desired
semantics, but performance is not critical.

 3) Programs where wrapping around on overflow is not the desired semantics
and performance is critical.

Programs in (1) are well-served by the language and libraries as they are,
and there's not much to do except to avoid regressing.

Programs in (2) and (3) are not as well-served.


# Checked math

For (2), the standard library offers checked math in the `CheckedAdd`,
`CheckedMul` etc. traits, as well as integer types of unbounded size:
`BigInt` and `BigUint`. This is good, but it's not enough. The acid test
has to be whether for non-performance-critical code, people are actually
*using* checked math. If they're not, then we've failed.

`CheckedAdd` and co. are important to have for flexibility, but they're far
too unwieldy for general use. People aren't going to write
`.checked_add(2).unwrap()` when they can write `+ 2`. A more adequate
design might be something like this:

 * Have traits for all the arithmetic operations for both checking on
overflow and for wrapping around on overflow, e.g. `CheckedAdd` (as now),
`WrappingAdd`, `CheckedSub`, `WrappingSub`, and so on.

 * Offer convenience methods for the Checked traits which perform
`unwrap()` automatically.

 * Have separate sets of integer types which check for overflow and which
wrap around on overflow. Whatever they're called: `CheckedU8`,
`checked::u8`, `uc8`, ...

 * Both sets of types implement all of the Checked* and Wrapping* traits.
You can use explicit calls to get either behavior with either types.

 * The checked types use the Checked traits to implement the operator
overloads (`Add`, Mul`, etc.), while the wrapping types use the Wrapping
traits to implement them. In other words, the difference between the types
is (probably only) in the behavior of the operators.

 * `BigInt` also implements all of the Wrapping and Checked traits: because
overflow never happens, it can claim to do anything if it "does happen".
`BigUint` implements all of them except for the Wrapping traits which may
underflow, such as `WrappingSub`, because it has nowhere to wrap around to.

Another option would be to have just one set of types but two sets of
operators, like Swift does. I think that would work as well, or even
better, but we've been avoiding having any operators which aren't familiar
from C.


#  Unbounded integers

While checked math helps catch instances of overflow and prevent
misbehaviors and bugs, many programs would prefer integer types which do
the right thing and don't overflow in the first place. For this, again, we
currently have `BigInt` and `BigUint`. There's one problem with these:
because they may allocate, they no longer `Copy`, which means that they
can't be just drop-in replacements for the fixed-size types.

To partially address this, once we have tracing GC, and if we manage to
make `Gc<T>: Copy`, we should add unbounded `Integer` (as in Haskell) and
`Natural` types which internally use `Gc`, and so are also `Copy`. (In
exchange, they wouldn't be `Send`, but that's far less pervasive.) These
would (again, asides from `Send`) look and feel just like the built-in
fixed-size types, while having the semantics of actual mathematical
integers, resp. naturals (up to resource exhaustion of course). They would
be ideal for code which is not performance critical and doesn't mind
incurring, or already uses, garbage collection. For those cases, you
wouldn't have to think about the tradeoffs, or make difficult choices:
`Integer` is what you use.

One concern with this would be the possibility of programs incurring GC
accidentally by using these types. There's several ways to deal with this:

 * Note the fact that they use GC prominently in the documentation.

 * Make sure the No-GC lint catches any use of them.

 * Offer a "no GC in this task" mode which fails the task if GC allocation
is invoked, to catch mistakes at runtime.

I think these would be more than adequate to address the concern.


# Between a rock and a hard place

Having dispatched the "easy" cases above, for category #3 we're left
between the rock (wraparound on overflow is wrong) and the hard place
(checking for overflow is slow).

Even here, we may have options.

An observation:

 * We are adamantly opposed to compiler switches to turn off array bounds
checking, because we are unwilling to compromise memory safety.

 * We are relatively unbothered by unchecked arithmetic, because it
*doesn't* compromise memory safety.

Taking these together, I think we should at least be less than adamantly
opposed to compiler switches for enabling or disabling checked arithmetic.

Consider the status quo. People use integer types which wrap on overflow.
If they ever do overflow, it means misbehaviors and bugs. If we had a
compiler flag to turn on checked arithmetic, even if it were only used a
few times in testing, it would be a strict improvement: more bugs would be
caught with less effort.

But we clearly can't just add this flag for existing types, because they're
well-defined to wrap around on overflow, and some programs (category #1)
rely on this. So we need to have separate types.

One option is therefore to just define this set of types as failing the
task on overflow if checked arithmetic is enabled, and to wrap around if
it's disabled. But it doesn't necessarily make sense to specify wraparound
in the latter case, given that programs are not supposed to depend on it:
they may be compiled with either flag, and should avoid overflow.

Another observation:

 * Undefined behavior is anathema to the safe subset of the language. That
would mean that it's not safe.

 * But *unspecified results* are maybe not so bad. We might already have
them for bit-shifts. (Question for the audience: do we?)

If unspecified results are acceptable, then we could instead say that these
types fail on overflow if checked arithmetic is enabled, and have
unspecified results if it isn't. But saying they wrap around is fine as
well.

This way, we can put off the choice between the rock and the hard place
from program-writing time to compile time, at least.


# Defaults

Even if we provide the various options from above, from the perspective of
what types people end up using, defaults are very important.

There's two kinds of defaults:

* The de jure default, inferred by the type system in the absence of other
information, which used to be `int`. Thankfully, we're removing this.

* The de facto, cultural default. For instance, if there is a type called
"int", most people will use it without thinking.

The latter question is still something we need to think about. Should we
have a clear cultural default? Or should we force people to explicitly
choose between checked and wrapping arithmetic?

For the most part, this boils down to:

 * If `int` is checked, the default is slow

 * If `int` wraps, the default is wrong

 * If there is no `int`, people are confused

Regarding the first, we seem to be living in deathly fear of someone
naively writing an arithmetic benchmark in Rust, putting it up on the
internet, and saying, "Look: Rust is slow". This is not an unrealistic
scenario, sadly. The question is whether we'd rather have more programs be
accidentally incorrect in order to avoid bad publicity from benchmarks
being accidentally slow.

Regarding the third, even if the only types we have are `intc`, `intw`,
`ic8`, `iw8`, and so on, we *still* can't entirely escape creating a
cultural default, because we still need to choose types for functions in
the standard library and for built-in operations like array indexing. Then
the path of least resistance for any given program will be to use the same
types.

There's one thing that might help resolve this conundrum, which is if we
consider the previously-described scheme with compiler flags to control
checked arithmetic to be acceptable. In that case, I think those types
would be the clear choice to be the de facto defaults. Then we would have:

 * `i8`, `u8` .. `i64`, `u64`, `int`, and `uint` types, which fail the task
on overflow if checked arithmetic is turned on, and either wrap around or
have an unspecified result if it's off

 * a corresponding set of types somewhere in the standard library, which
wrap around no matter what

 * and another set of corresponding types, which are checked no matter what


-Gábor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20140618/70dce952/attachment.html>


More information about the Rust-dev mailing list