Handling unshare() and kernel namespaces

Robert O'Callahan robert at ocallahan.org
Thu Apr 16 11:40:46 UTC 2015

Firefox now uses sandboxing on Linux. Some sandboxing is provided via
seccomp() filters which cause various syscalls to error out; that already
works with rr. Most of the security is provided by entering new
user/filesystem/network namespaces and chroot()ing to a temporary empty
directory. That doesn't work in rr. I discussed this with Chris on IRC and
we figured out how to make things work assuming we actually execute the
chroot; in that case the main difficulty is ensuring the tracee can, under
rr's control via AutoRemoteSyscalls, open temporary files and sockets
created by rr, and we can do that by giving the tracee a special fd which
refers to the tracer's root directory.

However, during replay the directory chroot()ed to doesn't exist, because
it's created and unlinked during the recording, so it seems to me we can't
execute the chroot() after all. That seems mostly OK since syscalls using
file paths are almost entirely emulated (i.e., ignored) during replay.
(Chris, I said on IRC that emulating chroot would be really hard, but
fortunately I think I was totally wrong.) The only exception I'm aware of
is execve(), where the filename passed during recording is passed to the
kernel for execution during replay. This is already a problem when a tracee
execs a temporary file. However, execve() in a chroot() sandbox is unlikely
to be used, since the usual ld.so interpreter and standard libraries cannot
be loaded, so the executable has to be carefully crafted or system
libraries carefully pulled into the sandbox. In practice I think we can
just fail if execve() occurs after a chroot().

It would be nice to treat the exec'ed file similarly to other mmapped
files, copying it to the trace or saving a hardlink to the trace directory
and exec'ing that. (Theoretically we should do the same for ld.so or
whatever other interpreter is specified for the binary.) It's a bit tricky
to do because the filename passed to the kernel affects the layout and
contents of the memory after exec, so we'd have to record and replay more
of that. I think we can put this off.

oIo otoeololo oyooouo otohoaoto oaonoyooonoeo owohooo oioso oaonogoroyo
owoiotoho oao oboroootohoeoro oooro osoiosotoeoro owoiololo oboeo
osouobojoeocoto otooo ojouodogomoeonoto.o oAogoaoiono,o oaonoyooonoeo
osoaoyoso otooo oao oboroootohoeoro oooro osoiosotoeoro,o o‘oRoaocoao,o’o
oaonosowoeoroaoboloeo otooo otohoeo ocooouoroto.o oAonodo oaonoyooonoeo
osoaoyoso,o o‘oYooouo ofooooolo!o’o owoiololo oboeo oiono odoaonogoeoro
otohoeo ofoioroeo ooofo ohoeololo.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rr-dev/attachments/20150416/960b804d/attachment.html>

More information about the rr-dev mailing list