Kernel bypass networking using pseudo-syscalls?
robert at ocallahan.org
Tue Aug 1 23:47:06 UTC 2017
Oops, forgot to CC rr-dev.
On Wed, Aug 2, 2017 at 11:21 AM, Robert O'Callahan <robert at ocallahan.org>
> On Wed, Aug 2, 2017 at 11:07 AM, Chris Pick <rr at chrispick.com> wrote:
>> On Tue, Aug 1, 2017 at 6:24 PM Robert O'Callahan <robert at ocallahan.org>
>> On Wed, Aug 2, 2017 at 9:37 AM, Chris Pick <rr at chrispick.com> wrote:
>>>> I was evangelizing rr to a friend last night
>>> Thank you! :-)
>>>> who's been debugging a program that uses kernel bypass for networking.
>>>> I assumed that 1) userspace uses DMA to interact with the networking
>>>> hardware and like shared memory, DMA isn't supported by rr. If those are
>>>> true and, further assuming 3) all the DMA/magic is hidden behind a pair of
>>>> send_pkt() and recv_pkt() functions,
>>> How true is assumption #3, do you know? I don't know anything about
>>> these interfaces.
>> Good question, I don't have any hands-on experience with his system
>> myself. My first goal was to partially sketch out a path for my friend to
>> investigate. Now I'm thinking through this hypothetical out of curiosity.
>> would it be possible to have rr treat that pair as a set of custom
>>>> pseudo-syscalls, recording their inputs and outputs for later replay?
>>> I imagine something similar must be done if rr supports
>>>> recording/replaying vDSO functions?
>>> We do that by patching the vDSO entry points to perform the equivalent
>>> regular syscall. That is pretty simple to do.
>>> I think that what you're suggesting could be implemented, but it would
>>> be significant work. One way to do it would be to read symbols during
>>> recording to locate the recv_pkt() function, and patch its exit with an
>>> rr-specific system call which takes the packet buffer address and size as a
>>> parameter. This would basically behave the same as a read syscall ---
>>> logging the output buffer during recording, and storing it back there
>>> during replay. Then one would add support for that syscall to librrpreload
>>> to get a non-kernel fast path.
>>> If you could manually patch the recv_pkt() function or a wrapper around
>>> it, that would make this a lot easier.
>>> This was along the lines of what I was thinking. It's definitely the
>> case that either a) the send/recv_pkt() functions are in a shared object
>> and can themselves be intercepted via preloading or b) the program could be
>> recompiled with wrapper functions around them.
> Aha, yes, using preloading to override a shared library function would be
> easy and we do have a precedent for that. (See src/preload/overrides.c.)
>> It sounds like that would be simple to do (I'm imagining it's a matter of
>> logging a pseudo-syscall number with the buffer/packet contents a-la
>> read(2)/write(2)? Is there any precedent for this and/or does it sound
>> like a reasonable thing to do?
> It sounds reasonable.
> The magic system call would have to be implemented but that's fairly easy
> to do.
>> Of course that approach would only work if the program does not have data
>>> races involving the DMA buffer. If it does, rr might not produce the
>>> correct execution during replay.
>>> Since I'm assuming the send/recv_pkt() functions are taking/returning
>> discrete packets and completely contain all the DMA messiness and that we
>> can wrap them and record their inputs/outputs I don't expect any of the
>> rest of the code to be able to race with them.
> Sure, as long as that's not the bug :-).
> lbir ye,ea yer.tnietoehr rdn rdsme,anea lurpr edna e hnysnenh hhe uresyf
> selthor stor edna siewaoeodm or v sstvr esBa kbvted,t
> o l euetiuruewFa kbn e hnystoivateweh uresyf tulsa rehr rdm or rnea
> .a war hsrer holsa rodvted,t nenh hneireseoouot.tniesiewaoeivatewt
> sstvr esn
lbir ye,ea yer.tnietoehr rdn rdsme,anea lurpr edna e hnysnenh hhe uresyf
selthor stor edna siewaoeodm or v sstvr esBa kbvted,t
o l euetiuruewFa kbn e hnystoivateweh uresyf tulsa rehr rdm or rnea
.a war hsrer holsa rodvted,t nenh hneireseoouot.tniesiewaoeivatewt sstvr
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the rr-dev