[rust-dev] Appeal for CORRECT, capable, future-proof math, pre-1.0

Patrick Walton pwalton at mozilla.com
Sat Jan 11 15:10:05 PST 2014


I think failure may have quite different inlining costs once we move to libunwind-based backtraces instead of hardcoding file/line number information into the generated code. The file and line number information tends to pollute generated code a lot and it's basically unnecessary with proper DWARF info and a functioning set of libunwind bindings, which we now have thanks to a couple of awesome contributions from you all. :)

Patrick

Owen Shepherd <owen.shepherd at e43.eu> wrote:
>On 11 January 2014 21:42, Daniel Micay <danielmicay at gmail.com> wrote:
>
>> On Sat, Jan 11, 2014 at 4:31 PM, Owen Shepherd <owen.shepherd at e43.eu>
>> wrote:
>> > So I just did a test. Took the following rust code:
>> > pub fn test_wrap(x : u32, y : u32) -> u32 {
>> >     return x.checked_mul(&y).unwrap().checked_add(&16).unwrap();
>> > }
>> >
>> > And got the following blob of assembly out. What we have there, my
>> friends,
>> > is a complete failure of the optimizer (N.B. it works for the
>simple
>> case of
>> > checked_add alone)
>> >
>> > Preamble:
>> >
>> > __ZN9test_wrap19hc4c136f599917215af4v0.0E:
>> >     .cfi_startproc
>> >     cmpl    %fs:20, %esp
>> >     ja    LBB0_2
>> >     pushl    $12
>> >     pushl    $20
>> >     calll    ___morestack
>> >     ret
>> > LBB0_2:
>> >     pushl    %ebp
>> > Ltmp2:
>> >     .cfi_def_cfa_offset 8
>> > Ltmp3:
>> >     .cfi_offset %ebp, -8
>> >     movl    %esp, %ebp
>> > Ltmp4:
>> >     .cfi_def_cfa_register %ebp
>> >
>> > Align stack (for what? We don't do any SSE)
>> >
>> >     andl    $-8, %esp
>> >     subl    $16, %esp
>>
>> The compiler aligns the stack for performance.
>>
>>
>
>Oops, I misread and thought there was 16 byte alignment going on there,
>not
>8.
>
>
>> > Multiply x * y
>> >
>> >     movl    12(%ebp), %eax
>> >     mull    16(%ebp)
>> >     jno    LBB0_4
>> >
>> > If it didn't overflow, stash a 0 at top of stack
>> >
>> >     movb    $0, (%esp)
>> >     jmp    LBB0_5
>> >
>> > If it did overflow, stash a 1 at top of stack (we are building an
>> > Option<u32> here)
>> > LBB0_4:
>> >     movb    $1, (%esp)
>> >     movl    %eax, 4(%esp)
>> >
>> > Take pointer to &this for __thiscall:
>> > LBB0_5:
>> >     leal    (%esp), %ecx
>> >     calll    __ZN6option6Option6unwrap21h05c5cb6c47a61795Zcat4v0.0E
>> >
>> > Do the addition to the result
>> >
>> >     addl    $16, %eax
>> >
>> > Repeat the previous circus
>> >
>> >     jae    LBB0_7
>> >     movb    $0, 8(%esp)
>> >     jmp    LBB0_8
>> > LBB0_7:
>> >     movb    $1, 8(%esp)
>> >     movl    %eax, 12(%esp)
>> > LBB0_8:
>> >     leal    8(%esp), %ecx
>> >     calll    __ZN6option6Option6unwrap21h05c5cb6c47a61795Zcat4v0.0E
>> >     movl    %ebp, %esp
>> >     popl    %ebp
>> >     ret
>> >     .cfi_endproc
>> >
>> >
>> > Yeah. Its' not fast because its' not inlining through
>option::unwrap.
>>
>> The code to initiate failure is gigantic and LLVM doesn't do partial
>> inlining by default. It's likely far above the inlining threshold.
>>
>>
>Right, why I suggested explicitly moving the failure code out of line
>into
>a separate function.
>
>
>> A purely synthetic benchmark only executing the unchecked or checked
>> instruction isn't interesting. You need to include several
>> optimizations in the loop as real code would use, and you will often
>> see a massive drop in performance from the serialization of the
>> pipeline. Register renaming is not as clever as you'd expect.
>>
>>
>Agreed. The variability within that tiny benchmark tells me that it
>can't
>really glean any valuable information.
>
>
>> The impact of trapping is known, because `clang` and `gcc` expose
>> `-ftrapv`.
>>  Integer-heavy workloads like cryptography and video codecs are
>> several times slower with the checks.
>>
>
>What about other workloads?
>
>As I mentioned: What I'd propose is trapping by default, with
>non-trapping
>math along the lines of a single additonal character on a type
>declaration
>away.
>
>Also, I did manage to convince Rust + LLVM to optimize things cleanly,
>by
>defining an unwrap which invoked libc's abort() -> !, so there's that.
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Rust-dev mailing list
>Rust-dev at mozilla.org
>https://mail.mozilla.org/listinfo/rust-dev

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20140111/870bd8ca/attachment-0001.html>


More information about the Rust-dev mailing list