x86 string instructions

Robert O'Callahan robert at ocallahan.org
Tue Feb 24 15:00:26 PST 2015


x86 REP-prefixed string instructions cause problems for rr because they are
loops which don't increment the retired-conditional-branches counter, and
this violates rr's assumption that the number of instruction executions
between rbc ticks is bounded.

For example "rep stosb" is sometimes used to fill large areas of memory. If
we get a signal during this loop, then when advance_to tries to replay to
the signal, it can't use breakpoints so it has to singlestep until it
reaches the target state, which could be millions of singlesteps of the
"rep stosb" instruction. This takes forever.

The problem more often arises when reverse-executing to a watchpoint where
the previous watchpoint firing was caused by a REP-prefixed string
instruction. We need to singlestep forward to before the watchpoint firing
(which is *after* the watchpoint firing, in reverse execution), and this
takes forever. Even worse, the ReplayTimeline logic that orders program
states also needs to do this kind of singlestepping to order the register
states for a given value of the tick counter.

To address these issues, we need to be able to execute many iterations of
the string instruction in one go --- without overshooting specified
destination state(s). The only way I can think of to do this is to compute
how many iterations of the instruction we should execute, and then set a
read/write watchpoint at a memory location that will be read/written near
the end of those iterations. This is unfortunately quite complicated for
several reasons:
-- "cmpsb" and "scasb" instructions may terminate early due to comparison
results, so we also need to set a breakpoint after the string instruction
to catch early exits.
-- Intel CPUs have a quirk where, when not-singlestepping, multiple
iterations of "rep stosb" (and other instructions) are coalesced to write
up to 64 bytes at a time (on Ivy Bridge; maybe more on other
microarchitectures). This effectively means watchpoints can fire late, so
we need to pad.
-- There are lots of variants of these instructions that work with words of
different sizes. The direction can also be changed dynamically.
-- We need this to work even when the debugger has its own
watchpoints/breakpoints applied. This is a pain because we may not have a
free watchpoint to use. Disambiguating user watchpoints from our internal
watchpoint is also a problem. I think the solution has to be that we adjust
the iteration count down so our fast-forward operation ends before any user
watchpoint can fire ... though that means we need to encode enough of the
string-instruction semantics to know when it would trigger arbitrary user
watchpoints.

I'm mainly writing this email to document the problem, but if anyone has
any better ideas, let me know :-).

Rob
-- 
oIo otoeololo oyooouo otohoaoto oaonoyooonoeo owohooo oioso oaonogoroyo
owoiotoho oao oboroootohoeoro oooro osoiosotoeoro owoiololo oboeo
osouobojoeocoto otooo ojouodogomoeonoto.o oAogoaoiono,o oaonoyooonoeo
owohooo
osoaoyoso otooo oao oboroootohoeoro oooro osoiosotoeoro,o o‘oRoaocoao,o’o
oioso
oaonosowoeoroaoboloeo otooo otohoeo ocooouoroto.o oAonodo oaonoyooonoeo
owohooo
osoaoyoso,o o‘oYooouo ofooooolo!o’o owoiololo oboeo oiono odoaonogoeoro
ooofo
otohoeo ofoioroeo ooofo ohoeololo.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rr-dev/attachments/20150225/9d7a2aeb/attachment.html>


More information about the rr-dev mailing list