Kernel bypass networking using pseudo-syscalls?
rr at chrispick.com
Tue Aug 1 23:07:15 UTC 2017
On Tue, Aug 1, 2017 at 6:24 PM Robert O'Callahan <robert at ocallahan.org>
> On Wed, Aug 2, 2017 at 9:37 AM, Chris Pick <rr at chrispick.com> wrote:
>> I was evangelizing rr to a friend last night
> Thank you! :-)
>> who's been debugging a program that uses kernel bypass for networking.
>> I assumed that 1) userspace uses DMA to interact with the networking
>> hardware and like shared memory, DMA isn't supported by rr. If those are
>> true and, further assuming 3) all the DMA/magic is hidden behind a pair of
>> send_pkt() and recv_pkt() functions,
> How true is assumption #3, do you know? I don't know anything about these
Good question, I don't have any hands-on experience with his system
myself. My first goal was to partially sketch out a path for my friend to
investigate. Now I'm thinking through this hypothetical out of curiosity.
would it be possible to have rr treat that pair as a set of custom
>> pseudo-syscalls, recording their inputs and outputs for later replay?
> I imagine something similar must be done if rr supports
>> recording/replaying vDSO functions?
> We do that by patching the vDSO entry points to perform the equivalent
> regular syscall. That is pretty simple to do.
> I think that what you're suggesting could be implemented, but it would be
> significant work. One way to do it would be to read symbols during
> recording to locate the recv_pkt() function, and patch its exit with an
> rr-specific system call which takes the packet buffer address and size as a
> parameter. This would basically behave the same as a read syscall ---
> logging the output buffer during recording, and storing it back there
> during replay. Then one would add support for that syscall to librrpreload
> to get a non-kernel fast path.
> If you could manually patch the recv_pkt() function or a wrapper around
> it, that would make this a lot easier.
> This was along the lines of what I was thinking. It's definitely the case
that either a) the send/recv_pkt() functions are in a shared object and can
themselves be intercepted via preloading or b) the program could be
recompiled with wrapper functions around them.
It sounds like that would be simple to do (I'm imagining it's a matter of
logging a pseudo-syscall number with the buffer/packet contents a-la
read(2)/write(2)? Is there any precedent for this and/or does it sound
like a reasonable thing to do?
Of course that approach would only work if the program does not have data
> races involving the DMA buffer. If it does, rr might not produce the
> correct execution during replay.
> Since I'm assuming the send/recv_pkt() functions are taking/returning
discrete packets and completely contain all the DMA messiness and that we
can wrap them and record their inputs/outputs I don't expect any of the
rest of the code to be able to race with them.
> Another question is whether it would be possible for your user to
> configure their program to not use the kernel bypass, e.g. using a regular
> socket API instead, and whether that would make it impossible for them to
> debug their bug.
> Very fair question, this would obviously be the simplest path forward.
The vast majority of his debugging was occurring at the higher application
request handling layer. This is what I'll recommend first.
Thanks for the fast response, -Chris
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the rr-dev