[rust-dev] Integer overflow, round -2147483648

Jerry Morrison jhm456 at gmail.com
Fri Jun 20 18:31:57 PDT 2014


On Fri, Jun 20, 2014 at 5:36 PM, Gábor Lehel <glaebhoerl at gmail.com> wrote:

>
>
>
> On Sat, Jun 21, 2014 at 1:37 AM, Jerry Morrison <jhm456 at gmail.com> wrote:
>
>>
>> On Fri, Jun 20, 2014 at 2:07 PM, Gábor Lehel <glaebhoerl at gmail.com>
>> wrote:
>>
>>>
>>>
>>>
>>> On Thu, Jun 19, 2014 at 9:05 AM, Jerry Morrison <jhm456 at gmail.com>
>>> wrote:
>>>
>>>> Nice analysis!
>>>>
>>>> Over what scope should programmers pick between Gábor's 3 categories?
>>>>
>>>> The "wraparound is desired" category should only occur in narrow parts
>>>> of code, like computing a hash. That suits a wraparound-operator better
>>>> than a wraparound-type, and certainly better than a compiler switch. And it
>>>> doesn't make sense for a type like 'int' that doesn't have a fixed size.
>>>>
>>>
>>> I thought hash algorithms were precisely the kind of case where you
>>> might opt for types which were clearly defined as wrapping. Why do you
>>> think using different operators instead would be preferred?
>>>
>>
>> Considering a hashing or CRC example, the code reads a bunch of
>> non-wraparound values, mashes them together using wraparound arithmetic,
>> then uses the result in a way that does not mean to wrap around at the
>> integer size.
>>
>> It's workable to convert inputs to wraparound types and use
>> wraparound accumulators, then convert the result to a non-wraparound type.
>> But using wraparound operators seems simpler, more visible, and less
>> error-prone. E.g. it'd be a mistake if the hash function returned a
>> wraparound type, which gets assigned with type inference, and so downstream
>> operations wrap around.
>>
>
> Yes, the signature of the hash function shouldn't necessarily expose the
> implementation's use of wraparound types... though it's not completely
> obvious to me. What kind of downstream operations would it make sense to
> perform on a hash value anyway? Anything besides further hashing?
>
> I'm only minimally knowledgeable about hashing algorithms, but I would've
> thought that casting the inputs to wraparound types at the outset and then
> casting the result back at the end would be *less* error prone than making
> sure to use the wraparound version for every operation in the function. Is
> that wrong? Are there any operations within the body of the hash function
> where overflow should be caught?
>
> And if we'd be going with separate operators instead of separate types,
> hash functions are a niche enough use case that, in themselves, I don't
> think they *warrant* having distinct symbolic operators for the wraparound
> operations; they could just use named methods instead.
>
> Hashing is the one that always comes up, but are there any other instances
> where wraparound is the desired semantics?
>

Here's an example hash function from *Effective Java
<http://www.amazon.com/Effective-Java-2nd-Joshua-Bloch-ebook/dp/B000WJOUPA/ref=sr_1_1?ie=UTF8&qid=1403312109&sr=8-1&keywords=joshua+bloch+effective+java>*
(page
48) following its recipe for writing hash functions by combining the
object's significant fields:

@Override public int hashCode() {

int result = 17;

result = 31 * result + areaCode;

result = 31 * result + prefix;

result = 31 * result + lineNumber;

return result;

}


So using Swift's wraparound operators in Java looks like:

@Override public int hashCode() {

int result = 17;

result = 31 &* result &+ areaCode;

result = 31 &* result &+ prefix;

result = 31 &* result &+ lineNumber;

return result;

}


Alternatively, with a wraparound integer type wint (note that int is
defined to be 32 bits in Java):

@Override public int hashCode() {

wint result = 17;

result = (wint) 31 * result + (wint) areaCode;

result = (wint) 31 * result + (wint) prefix;

result = (wint) 31 * result + (wint) lineNumber;

return (int) result;

}


In this example, it's easier to get the first one right than the second one.

The prototypical use for a hash code is to index into a hash table modulo
the table's current size. It can also be used for debugging, e.g. Java's
default toString() method uses the object's class name and hash, returning
something like "PhoneNumber at 163b91".

Another example of wraparound math is computing a checksum like CRC32.
The checksum value is typically sent over a wire or stored in a storage
medium to cross-check data integrity at the receiving end. After computing
the checksum, you only want to pass it around and compare it.

The only other example that comes to mind is emulating the arithmetic
operations of a target CPU or other hardware.

In other cases of bounded numbers, like ARGB color components, one wants to
deal with overflow, not silently wraparound.

Implementing BigInt can use wraparound math if it can also get the carry
bit.

Yes, these cases are so few that named operators may suffice. That's a bit
less convenient but linguistically simpler than Swift's 5 wraparound
arithmetic operators.



>>
>>>
>>>>
>>>> The "wraparound is undesired but performance is critical" category
>>>> occurs in the most performance critical bits of code [I'm doubting that all
>>>> parts of all Rust programs are performance critical], and programmers need
>>>> to make the trade-off over that limited scope. Maybe via operators or
>>>> types, but not by a compiler switch over whole compilation units.
>>>>
>>>> That leaves "wraparound is undesired and performance is not critical"
>>>> category for everything else. The choice between checked vs. unbounded
>>>> sounds like typing.
>>>>
>>>> BigUint is weird: It can underflow but not overflow. When you use its
>>>> value in a more bounded way you'll need to bounds-check it then, whether it
>>>> can go negative or not. Wouldn't it be easier to discard it than squeeze it
>>>> into the wraparound or checked models?
>>>>
>>>
>>> Making the unbounded integer types implement the Checking/Wrapping
>>> traits is more for completeness than anything else, I'm not sure whether it
>>> has practical value.
>>>
>>>  A BigUint/Natural type is not as important as BigInt/Integer, but it
>>> can be nice to have. Haskell only has Integer in the Prelude, but an
>>> external package provides Natural, and there've been proposals to mainline
>>> it. It's useful for function inputs where only nonnegative values make
>>> sense. You could write asserts manually, but you might as well factor them
>>> out. And types are documentation.
>>>
>>> The Haskell implementation of Natural is just a newtype over Integer
>>> with added checks, and the same thing might make sense for Rust.
>>>
>>
>> I see. Good points.
>>
>>
>>>
>>> On Wed, Jun 18, 2014 at 11:21 AM, Brian Anderson <banderson at mozilla.com>
>>> wrote:
>>>
>>>>
>>>> On 06/18/2014 10:08 AM, Gábor Lehel wrote:
>>>>
>>>>>
>>>>> # Checked math
>>>>>
>>>>> For (2), the standard library offers checked math in the `CheckedAdd`,
>>>>> `CheckedMul` etc. traits, as well as integer types of unbounded size:
>>>>> `BigInt` and `BigUint`. This is good, but it's not enough. The acid test
>>>>> has to be whether for non-performance-critical code, people are actually
>>>>> *using* checked math. If they're not, then we've failed.
>>>>>
>>>>> `CheckedAdd` and co. are important to have for flexibility, but
>>>>> they're far too unwieldy for general use. People aren't going to write
>>>>> `.checked_add(2).unwrap()` when they can write `+ 2`. A more adequate
>>>>> design might be something like this:
>>>>>
>>>>>  * Have traits for all the arithmetic operations for both checking on
>>>>> overflow and for wrapping around on overflow, e.g. `CheckedAdd` (as now),
>>>>> `WrappingAdd`, `CheckedSub`, `WrappingSub`, and so on.
>>>>>
>>>>>  * Offer convenience methods for the Checked traits which perform
>>>>> `unwrap()` automatically.
>>>>>
>>>>>  * Have separate sets of integer types which check for overflow and
>>>>> which wrap around on overflow. Whatever they're called: `CheckedU8`,
>>>>> `checked::u8`, `uc8`, ...
>>>>>
>>>>>  * Both sets of types implement all of the Checked* and Wrapping*
>>>>> traits. You can use explicit calls to get either behavior with either types.
>>>>>
>>>>>  * The checked types use the Checked traits to implement the operator
>>>>> overloads (`Add`, Mul`, etc.), while the wrapping types use the Wrapping
>>>>> traits to implement them. In other words, the difference between the types
>>>>> is (probably only) in the behavior of the operators.
>>>>>
>>>>>  * `BigInt` also implements all of the Wrapping and Checked traits:
>>>>> because overflow never happens, it can claim to do anything if it "does
>>>>> happen". `BigUint` implements all of them except for the Wrapping traits
>>>>> which may underflow, such as `WrappingSub`, because it has nowhere to wrap
>>>>> around to.
>>>>>
>>>>> Another option would be to have just one set of types but two sets of
>>>>> operators, like Swift does. I think that would work as well, or even
>>>>> better, but we've been avoiding having any operators which aren't familiar
>>>>> from C.
>>>>>
>>>>
>>>> The general flavor of this proposal w/r/t checked arithmetic sounds
>>>> pretty reasonable to me, and we can probably make progress on this now. I
>>>> particularly think that having checked types that use operator overloading
>>>> is important for ergonomics.
>>>>  _______________________________________________
>>>> Rust-dev mailing list
>>>> Rust-dev at mozilla.org
>>>> https://mail.mozilla.org/listinfo/rust-dev
>>>>
>>>
>>>
>>>
>>> --
>>>    Jerry
>>>
>>>>
>>>
>>
>>
>> --
>>    Jerry
>>
>
>


-- 
   Jerry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20140620/5754af0c/attachment.html>


More information about the Rust-dev mailing list