improving rr's trace compatibility story

Robert O'Callahan robert at ocallahan.org
Thu Jul 27 22:21:32 UTC 2017


== Problem statement ==

Currently every time we need to change the trace format we increment
TRACE_VERSION in TraceStream.cc. This means future versions of rr can't
play traces created by previous versions and vice versa. It would be better
if future versions of rr could continue to replay traces created by
previous versions, at least for some range of previous versions. Over time
we might drop support for replaying really old traces. (I don't see any
need to replay new traces with old rr versions, which is good because that
seems impractical.)

This is related to https://github.com/mozilla/rr/issues/1916 but seems
orthogonal since this just relates to trace file storage. I still think
something like #1916 would be good but it's not a priority at this time.

I looked through the changes to TRACE_VERSION over the last 18 months. They
fall into a few categories:
1) Adding information to the trace that various tools might want and that's
otherwise difficult to get, such as the "own namespace TID" of each task,
but that isn't used by replay.
2) Adding information to the trace that needs to be used by replay in some
tricky cases, e.g. CPUID values, or a new EV_SECCOMP_TRAP event type, or a
new data blob recorded for certain syscalls.
3) Changes to the interaction with librrpreload, usually to handle some
tricky new cases.

== Possible solution ==

Cases #1 and #2 would generally be pretty easy to handle if the trace
format was extensible with new struct fields, enum values, etc ... e.g.
using Protocolbuffers or Capnproto. We would want a way to check what extra
data values are supported by the trace up front, without reading all
records; we might need an (extensible) trace header record with flags for
this.

Case #3 is difficult to handle. It seems that we would have to add a
"librrpreload interface version" field which only increments when the
interface needs to be changed (stored in the trace file). During replay
we'd need conditional code in the rr process to check the interface version
and adapt to it by skipping checks, adjusting field offsets, etc. Not
great, but there's not much else we can do given librrpreload's memory
layout and execution can't depend on the host rr version.

I think a plausible approach would be to use Capnproto to define the record
format in each of the structured trace files: https://capnproto.org.
Capnproto lets you define a type schema which can be updated in specific
ways --- adding fields to structs, enum values, etc --- so that data
serialized using older versions of the schema can be read by code using a
newer version of the schema. (It also supports the other direction, but we
don't need that.)

Currently the trace data is split into multiple files ---
'data_header'/'data' (bulk updates to tracee memory), 'events', 'mmaps',
'tasks' and 'generic'. This still mostly makes sense: we want to be able to
quickly read all mmaps and all tasks without reading all events, and we
want to be able to read all events without reading all the raw-data.
However, I think it would be good to merge 'generic' into 'events'.
Logically, specific event types contain extra fields whose data is
currently stored in 'generic' blobs, and those fields should be subjected
to Capnproto's versioning. I think we can also merge 'data_header' into
'events', giving each event a list of 'data_header' records.

When making this change I would increment TRACE_VERSION one more time and
after the version-number line in 'version', add a 'global data' Capnproto
record.

This would require people building rr to install Capnproto packages.

Disclaimer: I've never used Capnproto myself so possibly I've just fallen
for the marketing.

Any thoughts appreciated!

Rob
-- 
lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf
toD
selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t
rdsme,aoreseoouoto
o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea
lurpr
.a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr
esn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rr-dev/attachments/20170728/ee5a0006/attachment.html>


More information about the rr-dev mailing list