Crash when recording clock_gettime

Benjamin King benjaminking at web.de
Tue Aug 29 07:25:05 UTC 2017


Hi Rob,

On Tue, Aug 29, 2017 at 09:36:21AM +1200, Robert O'Callahan wrote:
>On Tue, Aug 29, 2017 at 8:14 AM, Benjamin King <benjaminking at web.de> wrote:
>
>    clock_gettime from my libc disassembles to this:
>    ----
>    0000000000115810 <__clock_gettime@@GLIBC_PRIVATE>:
>     115810:        55                      push   %rbp
>     115811:        53                      push   %rbx
>     115812:        48 89 f5                mov    %rsi,%rbp
>     115815:        89 fb                   mov    %edi,%ebx
>     115817:        48 83 ec 08             sub    $0x8,%rsp
>     11581b:        48 8b 05 c6 3a 2b 00    mov    0x2b3ac6(%rip),%rax        #
>    3c92e8 <_dl_open_hook@@GLIBC_PRIVATE+0x8>
>     115822:        48 c1 c8 11             ror    $0x11,%rax
>     115826:        64 48 33 04 25 30 00    xor    %fs:0x30,%rax
>     11582d:        00 00  11582f:  48 85 c0                test   %rax,%rax
>     115832:        74 2c                   je     115860
>    <__clock_gettime@@GLIBC_PRIVATE+0x50>
>     115834:        ff d0                   callq  *%rax
>     ...
>    ----
>
>    Both commit e7b00db and the previous one are going through this, but the
>    call
>    to *%rax sends rr commit e7b00db to a bunch of instructions that are not in
>    the disassembly of my libc. I also fail to understand what they are
>    supposed
>    to do and they crash, eventually.
>
>
>Can you paste the disassembly of those instructions at *%rax in the failing
>case? Also, if you look up /proc/<pid>/maps for the process you should be able
>to see the mapping they belong to.


this is the disassembly where the call to *%rax is jumping into:

----
0x7ffe808ec8c0  push   %rbp                  <-- call *%rax comes to here
0x7ffe808ec8c1  cmp    $0x1,%edi
0x7ffe808ec8c4  mov    %rsp,%rbp
0x7ffe808ec8c7  push   %rbx
0x7ffe808ec8c8  je     0x7ffe808ec93b
0x7ffe808ec8ca  jle    0x7ffe808ec9cb        +-- branch taken
...                                          |
0x7ffe808ec9cb  test   %edi,%edi          <--+
0x7ffe808ec9cd  jne    0x7ffe808ec9be
0x7ffe808ec9cf  mov    -0x2956(%rip),%r9d        # 0x7ffe808ea080 <-- crash
...
----

During rr replay, I can not get information about the mappings from within gdb:
----
(rr) info proc mappings 
process 7084
warning: unable to open /proc file '/proc/7084/maps'
----

I don't have a process with PID 7094, but I have this:
----
bki       7091  0.0  0.1  46560 10556 pts/3    T    06:19   0:00 rr replay
bki       7092  0.0  0.0  13764  3168 ?        ts   06:19   0:00 [rr:a.out]
bki       7096  0.0  0.0   2376   168 ?        t    06:19   0:00 [rr:a.out]
----

A snippet of /proc/7092/maps:
----
...
7f5a487b4000-7f5a487b5000 r--p 00025000 08:03 2359365                    /lib/x86_64-linux-gnu/ld-2.23.so
7f5a487b5000-7f5a487b6000 rw-p 00026000 08:03 2359365                    /lib/x86_64-linux-gnu/ld-2.23.so
7f5a487b6000-7f5a487b7000 rw-p 00000000 00:00 0 
7ffe80806000-7ffe80828000 rw-p 00000000 00:00 0 
7ffe808ec000-7ffe808ee000 r-xp 00000000 00:00 0            <------ Code is from here
7ffe808ef000-7ffe808f0000 r-xp 00000000 00:00 0 
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
----

7096 has fewer mappings, but both have the unnamed 8k mapping from
7ffe808ec000-7ffe808ee000. I dumped that to a file and it contains the same
symbols as the [vdso] mapping usually has when running without rr:
----
Symbol table '.dynsym' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000000008c0   875 FUNC    WEAK   DEFAULT   12 clock_gettime@@LINUX_2.6
     2: 0000000000000c30   470 FUNC    GLOBAL DEFAULT   12 __vdso_gettimeofday@@LINUX_2.6
     3: 0000000000000c30   470 FUNC    WEAK   DEFAULT   12 gettimeofday@@LINUX_2.6
     4: 0000000000000e10    21 FUNC    GLOBAL DEFAULT   12 __vdso_time@@LINUX_2.6
     5: 0000000000000e10    21 FUNC    WEAK   DEFAULT   12 time@@LINUX_2.6
     6: 00000000000008c0   875 FUNC    GLOBAL DEFAULT   12 __vdso_clock_gettime@@LINUX_2.6
     7: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6
     8: 0000000000000e30    41 FUNC    GLOBAL DEFAULT   12 __vdso_getcpu@@LINUX_2.6
     9: 0000000000000e30    41 FUNC    WEAK   DEFAULT   12 getcpu@@LINUX_2.6
----

Disassembling this will show the same code from above:
----
00000000000008c0 <__vdso_clock_gettime@@LINUX_2.6>:
 8c0:	55                   	push   %rbp
 8c1:	83 ff 01             	cmp    $0x1,%edi
 8c4:	48 89 e5             	mov    %rsp,%rbp
 8c7:	53                   	push   %rbx
 8c8:	74 71                	je     93b <__vdso_clock_gettime@@LINUX_2.6+0x7b>
 8ca:	0f 8e fb 00 00 00    	jle    9cb <__vdso_clock_gettime@@LINUX_2.6+0x10b>
...
 9cb:	85 ff                	test   %edi,%edi
 9cd:	75 ef                	jne    9be <__vdso_clock_gettime@@LINUX_2.6+0xfe>
 9cf:	44 8b 0d aa d6 ff ff 	mov    -0x2956(%rip),%r9d        # ffffffffffffe080 <__vdso_getcpu@@LINUX_2.6+0xffffffffffffd250>
----
The crashy mov in the end tries to read from memory which does not have a
mapping.

When I run the program in gdb alone, I have the same "mov -0x2956(%rip)", but
in that case, the address is backed by an 8k mapping called "[vvar]" whose contents
I can read with neither dd nor gdb's dump command (?).

The [vvar] mapping is directly in front of the [vdso]:
----
7ffff7ff8000-7ffff7ffa000 r--p 00000000 00:00 0                          [vvar]
7ffff7ffa000-7ffff7ffc000 r-xp 00000000 00:00 0                          [vdso]
----

I can not see such a mapping directly in front of the vdso when rr replay is
runnning.

Disabling ASLR changes this. rr record does not crash and during replay, I can
see an 8k mapping in front of the mapping that I identified as the vdso. I
can also dump that mapping to a file and the 'file' command tells me that it
is 'data'.

Reenabling ASLR again but running with the the commit before e7b00db changes
the picture. rr record also does not crash and during replay, I can see that
the vdso mapping has an 8k non-executable mapping directly in front of it.

Cheers,
  Benjamin


More information about the rr-dev mailing list