Emulating execve, and fixing MappedResource and friends

Robert O'Callahan robert at ocallahan.org
Fri Jul 24 00:12:23 UTC 2015

Currently when we replay an execve we run the real exec with the original
executable path name, argv and envp. This has a few problems:
1) The original executable must exist at the original filesystem location.
Thus traces are not easily transported between systems, and changes to the
executable (e.g. by rebuilding it) will always invalidate the trace.
2) We assume the address space set up by execve() is deterministic (apart
from a few bits of data that we know aren't, like ELF auxv AT_RANDOM). To
ensure this we have to disable ASLR, which reduces the entropy of tests
running under rr. This probably inhibits transporting traces between
systems with different kernel/OS versions too.
3) It would be nice to get rid of executed-syscalls in general and make
everything emulated, since that would make replay more regular. In
particular we would guarantee that between ReplaySession::replay_steps, all
tracee tasks are stopped in the kernel just before they return to
userspace. Currently, between the entry and exit events for an executed
syscall, the tracee is at the "about to dispatch syscall" point in the
kernel and taking a checkpoint there doesn't work.

I have code to fix this by emulating execve. Basically we ignore the
original execve, and instead do our own execve of a dedicated stub. (We
still need to do a real execve because some execs need to change the
architecture of the process and execve() is the only way to do that.) After
the stub exec we unmap (almost) everything in the address space and
recreate the mappings created by the recorded exec.

Needless to say this was tricky and uncovered a number of interesting
kernel quirks. But perhaps the biggest problem is that it uncovered more
issues with our /proc/maps caching in AddressSpace.cc. I realized that part
of the problem is that our cached mappings serve two very different
purposes: tracking what's actually mapped so we don't need to look up
/proc/maps, and tracking what those mappings are being used for by the
tracee. For example, during recording there's a mapping named [vdso] which
is PSEUDODEVICE_VDSO, but during replay, with my patches, the VDSO is
actually an anonymous mapping which we've filled with a copy of the
recorded VDSO. Likewise during recording a mapped file will have its real
file name but during replay that mapping will be an anonymous map with
trace data copied into it, or a mapping of a file from the trace directory.
Currently we're trying to cram both sets of information into a single
MappableResource and it's not working.

We also have some duplication of abstractions: there's MemoryRange,
Mapping, MappableResource and FileId just in AddressSpace.h, and elsewhere
we have TraceMappedRegion and TraceReader::MappedData.

I'm thinking of reorganizing this as follows:
-- Switch MemoryRange to use start/end instead of start/length.
-- Create a KernelMapping which is exactly the data we obtain from the
kernel about a mapping: the data available in /proc/maps, plus the fstat of
the mapped file if there is one, inheriting from MemoryRange.
-- Merge MappableResource into Mapping and have Mapping contain two
KernelMappings, one with the data from recording and the other with the
data from replay (not present during recording, of course). Mapping also
includes other rr annotations, such as the inferred pseudodevice and other
information gathered by monitoring system calls rather than /proc/maps.
-- AddressSpace validation then only needs to check that the record (or
replay) KernelMapping matches what's in /proc/maps, and validate the
internal consistency of a Mapping.
-- Eliminate TraceMappedRegion and just serialize AddressSpace::Mapping
to/from the trace.
-- Keep TraceReader::MappedData in its current role --- returning
instructions to the caller of TraceReader::read_mapped_region for how to
obtain the data for a mapping. But we can simplify it to remove the
prot/flags fields, since AddressSpace::Mapping will have them.

lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf
selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t
o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea
.a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rr-dev/attachments/20150724/219e939d/attachment.html>

More information about the rr-dev mailing list