FF replay problem involving madvise

Robert O'Callahan robert at ocallahan.org
Thu Feb 21 14:22:34 PST 2013

On Wed, Feb 20, 2013 at 9:57 PM, Thomas Anderegg <thomas at tanderegg.com>wrote:

> The scratch memory is used for correctness with blocking system calls: For
> example, if a thread issues a read syscall, rr schedules another thread
> while the read syscall is processed by the OS as this operation is blocking.
> The other thread could then access the memory location to which the read
> system call will write the data to, which could introduce data races that
> are not reproducible in the replay, hence the indirection with the scratch
> memory.

Right. User-space scratch buffers are needed to avoid potential races which
could occur with valid (non-crashy) programs. If the application reads our
scratch buffers (which it shouldn't unless it's behaving strangely, since
it didn't allocate them), we could have races between application code and
system calls that cause replay to fail. That's why it's nice to treat those
as bugs and catch them as crashes during replay. My goal has always been to
record and replay our applications, not some arbitrarily malicious binary,
so I'm totally comfortable with ruling out programs that read/write our
scratch buffers. I just want to detect them.

Chris Jones wrote:

> I prefer to map the buffers during replay.  If the recorder can affect
> application behavior in these kinds of ways, we should replay those effects
> too.

We could easily map the buffers into the replay process. But it would be
extra work to actually populate those buffers with the same data they get
populated with during recording, and even then, for strange programs that
do stray reads/writes to the buffers, there could be races we fail to
replay. So I don't really want to populate them during replay.

Given the uncertainty, I think we should probably err on the side of
simplicity and do the simplest thing that makes Firefox work. I think that
is probably to mmap those scratch buffers in the replay process but map
them PROT_NONE so that any access will cause a fault, but madvise and some
other syscalls will see the memory as mapped.

Wrfhf pnyyrq gurz gbtrgure naq fnvq, “Lbh xabj gung gur ehyref bs gur
Tragvyrf ybeq vg bire gurz, naq gurve uvtu bssvpvnyf rkrepvfr nhgubevgl
bire gurz. Abg fb jvgu lbh. Vafgrnq, jubrire jnagf gb orpbzr terng nzbat
lbh zhfg or lbhe freinag, naq jubrire jnagf gb or svefg zhfg or lbhe fynir
— whfg nf gur Fba bs Zna qvq abg pbzr gb or freirq, ohg gb freir, naq gb
tvir uvf yvsr nf n enafbz sbe znal.” [Znggurj 20:25-28]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rr-dev/attachments/20130222/08f36df0/attachment.html>

More information about the rr-dev mailing list