Problems in Subprocesses of Tracees

Jun Inoue jun.lambda at gmail.com
Thu Dec 7 03:11:38 UTC 2017


I've done some more digging on this.  As it turns out, the problem
reproduces with this dead simple tracee:

#include <stdlib.h>
int
main()
{
  system("/sbin/ifconfig");
  return 0;
}

However, if I change the system() call to

  if (fork() == 0)
    {
      char ifconfig[] = "/sbin/ifconfig";
      char *args[] = { ifconfig, NULL };
      execv(ifconfig, args);
    }

then the problem no longer reproduces.  I've tried this on kernels
4.4.53-rt66 (home-compiled on Ubuntu 14.0.4.5 LTS) and
4.10.0-19-generic (official package from Ubuntu 17.04), both running
on amd64, but it made no difference.  The same applies to the
following additional observations.

I've tried a few other programs in /sbin in place of ifconfig, but
they all either did not trigger the assertion failure or issued ioctls
that rr doesn't recognize.
When I leave out the fork() and do just an execv(), rr replay gets
locked up at a fixed address 0x0000000070000002 (which is the same
address under both kernels over multiple recordings):

_RR_TRACE_DIR=tmp rr record test/exec_ifconfig; _RR_TRACE_DIR=tmp rr replay
rr: Saving execution to trace directory
`/dvl/openhrp/wk/again/build/tmp/exec_ifconfig-0'.
enxb88d1255ba5c: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 150.29.145.120  netmask 255.255.254.0  broadcast 150.29.145.255
        inet6 fe80::ba8d:12ff:fe55:ba5c  prefixlen 64  scopeid 0x20<link>
        ether b8:8d:12:55:ba:5c  txqueuelen 1000  (Ethernet)
        RX packets 116698  bytes 36954127 (36.9 MB)
        RX errors 0  dropped 2  overruns 0  frame 0
        TX packets 30214  bytes 5843448 (5.8 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 56594  bytes 7620985 (7.6 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 56594  bytes 7620985 (7.6 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

GNU gdb (Ubuntu 7.12.50.20170314-0ubuntu1) 7.12.50.20170314-git
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from
/dvl/openhrp/wk/again/build/tmp/exec_ifconfig-0/mmap_hardlink_3_exec_ifconfig...done.
Really redefine built-in command "restart"? (y or n) [answered Y;
input not from terminal]
Remote debugging using 127.0.0.1:26020
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols
from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.24.so...done.
done.
0x00007ff12bb62c20 in _start () from /lib64/ld-linux-x86-64.so.2
(rr) c
Continuing.

Program stopped.
0x0000000070000002 in ?? ()
(rr) c
Continuing.

Program stopped.
0x0000000070000002 in ?? ()
(rr) c
Continuing.

Program stopped.
0x0000000070000002 in ?? ()
(rr) bt
#0  0x0000000070000002 in ?? ()
#1  0x00007ff12b939202 in _raw_syscall () at
/dvl/openhrp/wk/rr/src/preload/raw_syscall.S:120
#2  0x00007ff12b935bce in traced_raw_syscall
(call=call at entry=0x681fffa0) at
/dvl/openhrp/wk/rr/src/preload/syscallbuf.c:203
#3  0x00007ff12b936d5f in syscall_hook_internal (call=0x681fffa0) at
/dvl/openhrp/wk/rr/src/preload/syscallbuf.c:2542
#4  syscall_hook (call=0x681fffa0) at
/dvl/openhrp/wk/rr/src/preload/syscallbuf.c:2576
#5  0x00007ff12b935aaa in _syscall_hook_trampoline () at
/dvl/openhrp/wk/rr/src/preload/syscall_hook.S:282
#6  0x00007ff12b935ada in __morestack () at
/dvl/openhrp/wk/rr/src/preload/syscall_hook.S:417
#7  0x00007ff12b935ae0 in _syscall_hook_trampoline_48_3d_01_f0_ff_ff
() at /dvl/openhrp/wk/rr/src/preload/syscall_hook.S:423
#8  0x00007ff12b63b76d in execve () at ../sysdeps/unix/syscall-template.S:84
#9  0x000055d0442b7929 in main () at
/dvl/openhrp/wk/again/test/popen_ifconfig.c:34
(rr) disas $rip,+10
Dump of assembler code from 0x70000002 to 0x7000000c:
=> 0x0000000070000002:    retq
   0x0000000070000003:    syscall
   0x0000000070000005:    retq
   0x0000000070000006:    syscall
   0x0000000070000008:    retq
   0x0000000070000009:    syscall
   0x000000007000000b:    retq
End of assembler dump.
(rr) p *(void**)$rsp
$1 = (void *) 0x7ff12b939202 <_raw_syscall+51>
(rr) si

Program stopped.
0x0000000070000002 in ?? ()
(rr) c
Continuing.

Program stopped.
0x0000000070000002 in ?? ()



On Wed, Dec 6, 2017 at 2:23 PM, Jun Inoue <jun.lambda at gmail.com> wrote:
> On Wed, Dec 6, 2017 at 2:04 PM, Robert O'Callahan <robert at ocallahan.org> wrote:
>> On Wed, Dec 6, 2017 at 5:45 PM, Jun Inoue <jun.lambda at gmail.com> wrote:
>>>
>>> Hi there!  I'm having trouble with rr barfing on a subprocess of a
>>> tracee, which is called rtcd.
>>> Recording seems to go fine, but during replay I get "Assertion
>>> `ticks_now == trace_ticks' failed to hold" (more details at the end of
>>> this email).  The snag is that rr then suggests me to attach gdb to
>>> /sbin/ifconfig, rather than rtcd - it's a subprocess of rtcd, and not
>>> rtcd itself that rr is barfing on.
>>>
>>> Now, /sbin/ifconfig seems to be one of those programs that use shared
>>> memories to communicate with the outside world, so I want to prevent
>>> rtcd from monitoring this subprocess at all.  AFAIK all communications
>>> between rtcd and ifconfig happen through pipes, so this shouldn't
>>> affect rr's ability to replay.  Is there a way to do this?
>>
>>
>> No, we don't support detaching a tracee subprocess from recording, although
>> I suppose it wouldn't be that hard in principle.
>>
>> I don't know of anything in /sbin/ifconfig that should cause problems. It's
>> more likely it's a syscall-related issue than a shmem issue. Can you `rr
>> dump -b -m -p` the trace leading up to event 1816?
>>
>> Rob
>> --
>> lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf
>> toD
>> selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t
>> rdsme,aoreseoouoto
>> o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea lurpr
>> .a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr
>> esn
>
> I see.  Here's an excerpt of the dump from event 1800-1816.  A more
> complete dump from 0-1816 is available at
> https://drive.google.com/open?id=1k3Lww_oCPOY1FPfmOJDDhsrlIJ05QvbD
>
> {
>   real_time:27296.016853 global_time:1800, event:`SYSCALL: ioctl'
> (state:EXITING_SYSCALL) tid:24642, ticks:92790
> rax:0x0 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x7ffe7e5bf5b0
> rsi:0x8919 rdi:0x4 rbp:0x10 rsp:0x681ffe00 r8:0xfefefefefefefeff
> r9:0xfefefefefeff6e6b r10:0x7ffe7e5bf5b0 r11:0x246 r12:0x4 r13:0x2
> r14:0x0 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0
> es:0x0 fs:0x0 gs:0x0 orig_rax:0x10 fs_base:0x7fb2e15f0740 gs_base:0x0
>   { tid:24642, addr:0x7ffe7e5bf5b0, length:0x28 }
>   { tid:24642, addr:0x681fee00, length:0x1000 }
> }
> {
>   real_time:27296.016891 global_time:1801, event:`SYSCALL: ioctl'
> (state:ENTERING_SYSCALL) tid:24642, ticks:92811
> rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff
> rdx:0x7ffe7e5bf5b0 rsi:0x891b rdi:0x4 rbp:0x10 rsp:0x681ffe00
> r8:0xfefefefefefefeff r9:0xfefefefefeff6e6b r10:0x7ffe7e5bf5b0
> r11:0x246 r12:0x4 r13:0x2 r14:0x0 r15:0x0 rip:0x70000002 eflags:0x246
> cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x10
> fs_base:0x7fb2e15f0740 gs_base:0x0
> }
> {
>   real_time:27296.016914 global_time:1802, event:`SYSCALL: ioctl'
> (state:EXITING_SYSCALL) tid:24642, ticks:92811
> rax:0x0 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x7ffe7e5bf5b0
> rsi:0x891b rdi:0x4 rbp:0x10 rsp:0x681ffe00 r8:0xfefefefefefefeff
> r9:0xfefefefefeff6e6b r10:0x7ffe7e5bf5b0 r11:0x246 r12:0x4 r13:0x2
> r14:0x0 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0
> es:0x0 fs:0x0 gs:0x0 orig_rax:0x10 fs_base:0x7fb2e15f0740 gs_base:0x0
>   { tid:24642, addr:0x7ffe7e5bf5b0, length:0x28 }
>   { tid:24642, addr:0x681fee00, length:0x1000 }
> }
> {
>   real_time:27296.016975 global_time:1803, event:`SYSCALLBUF_FLUSH'
> tid:24642, ticks:104070
> }
> {
>   real_time:27296.016979 global_time:1804, event:`SYSCALL: mmap'
> (state:ENTERING_SYSCALL) tid:24642, ticks:104070
> rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x3
> rsi:0x1000 rdi:0x0 rbp:0x9 rsp:0x681ffe00 r8:0xffffffff r9:0x0
> r10:0x22 r11:0x246 r12:0x10b8200 r13:0x2 r14:0x0 r15:0x7ffe7e5bf3c8
> rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0
> gs:0x0 orig_rax:0x9 fs_base:0x7fb2e15f0740 gs_base:0x0
> }
> {
>   real_time:27296.016980 global_time:1805, event:`SYSCALLBUF_RESET'
> tid:24642, ticks:104070
> }
> {
>   real_time:27296.017028 global_time:1806, event:`SYSCALL: mmap'
> (state:EXITING_SYSCALL) tid:24642, ticks:104070
> rax:0x7fb2e1613000 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x3
> rsi:0x1000 rdi:0x0 rbp:0x9 rsp:0x681ffe00 r8:0xffffffff r9:0x0
> r10:0x22 r11:0x246 r12:0x10b8200 r13:0x2 r14:0x0 r15:0x7ffe7e5bf3c8
> rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0
> gs:0x0 orig_rax:0x9 fs_base:0x7fb2e15f0740 gs_base:0x0
>   { map_file:"<ZERO>", addr:0x7fb2e1613000, length:0x1000,
> prot_flags:"rw-p", file_offset:0x0, device:0, inode:0, data_file:"",
> data_offset:0x0, file_size:0x1000 }
> }
> {
>   real_time:27296.017111 global_time:1807, event:`SYSCALLBUF_FLUSH'
> tid:24642, ticks:106573
> }
> {
>   real_time:27296.017115 global_time:1808, event:`SYSCALL: mmap'
> (state:ENTERING_SYSCALL) tid:24642, ticks:106573
> rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x3
> rsi:0x1000 rdi:0x0 rbp:0x9 rsp:0x681ffe00 r8:0xffffffff r9:0x0
> r10:0x22 r11:0x246 r12:0x0 r13:0x2 r14:0x40cb53 r15:0x10b9bf0
> rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0
> gs:0x0 orig_rax:0x9 fs_base:0x7fb2e15f0740 gs_base:0x0
> }
> {
>   real_time:27296.017117 global_time:1809, event:`SYSCALLBUF_RESET'
> tid:24642, ticks:106573
> }
> {
>   real_time:27296.017141 global_time:1810, event:`SYSCALL: mmap'
> (state:EXITING_SYSCALL) tid:24642, ticks:106573
> rax:0x7fb2e1612000 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x3
> rsi:0x1000 rdi:0x0 rbp:0x9 rsp:0x681ffe00 r8:0xffffffff r9:0x0
> r10:0x22 r11:0x246 r12:0x0 r13:0x2 r14:0x40cb53 r15:0x10b9bf0
> rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0
> gs:0x0 orig_rax:0x9 fs_base:0x7fb2e15f0740 gs_base:0x0
>   { map_file:"<ZERO>", addr:0x7fb2e1612000, length:0x1000,
> prot_flags:"rw-p", file_offset:0x0, device:0, inode:0, data_file:"",
> data_offset:0x0, file_size:0x1000 }
> }
> {
>   real_time:27296.017189 global_time:1811, event:`SYSCALLBUF_FLUSH'
> tid:24642, ticks:106709
> }
> {
>   real_time:27296.017192 global_time:1812, event:`SYSCALL: munmap'
> (state:ENTERING_SYSCALL) tid:24642, ticks:106709
> rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0
> rsi:0x1000 rdi:0x7fb2e1612000 rbp:0xb rsp:0x681ffe00 r8:0x10b9cd0
> r9:0x7fb2e15f0740 r10:0x22 r11:0x246 r12:0x0 r13:0x7ffe7e5bf563
> r14:0x10b81f0 r15:0x7ffe7e5bf570 rip:0x70000002 eflags:0x246 cs:0x33
> ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xb
> fs_base:0x7fb2e15f0740 gs_base:0x0
> }
> {
>   real_time:27296.017194 global_time:1813, event:`SYSCALLBUF_RESET'
> tid:24642, ticks:106709
> }
> {
>   real_time:27296.017215 global_time:1814, event:`SYSCALL: munmap'
> (state:EXITING_SYSCALL) tid:24642, ticks:106709
> rax:0x0 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x1000
> rdi:0x7fb2e1612000 rbp:0xb rsp:0x681ffe00 r8:0x10b9cd0
> r9:0x7fb2e15f0740 r10:0x22 r11:0x246 r12:0x0 r13:0x7ffe7e5bf563
> r14:0x10b81f0 r15:0x7ffe7e5bf570 rip:0x70000002 eflags:0x246 cs:0x33
> ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xb
> fs_base:0x7fb2e15f0740 gs_base:0x0
> }
> {
>   real_time:27296.017288 global_time:1815, event:`SYSCALLBUF_FLUSH'
> tid:24642, ticks:113276
> }
> {
>   real_time:27296.017325 global_time:1816, event:`PATCH_SYSCALL'
> tid:24642, ticks:113276
> rax:0x1 rbx:0x17c rcx:0xffffffffffffffff rdx:0x17c rsi:0x7fb2e1613000
> rdi:0x1 rbp:0x7fb2e1613000 rsp:0x7ffe7e5bf508 r8:0x0 r9:0x7fb2e11c3e88
> r10:0x7fb2e0bf5270 r11:0x246 r12:0x7fb2e11c1400 r13:0x17c r14:0x6
> r15:0x0 rip:0x7fb2e0eed38e eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0
> fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7fb2e15f0740
> gs_base:0x0
>   { tid:24642, addr:0x7fb2e1418365, length:0x4f }
>   { tid:24642, addr:0x7fb2e0eed38e, length:0x5 }
>   { tid:24642, addr:0x7fb2e0eed393, length:0x3 }
> }
>
>
> --
> Jun Inoue



-- 
Jun Inoue


More information about the rr-dev mailing list