Kernel bypass networking using pseudo-syscalls?

Robert O'Callahan robert at ocallahan.org
Tue Aug 1 23:47:06 UTC 2017


Oops, forgot to CC rr-dev.

On Wed, Aug 2, 2017 at 11:21 AM, Robert O'Callahan <robert at ocallahan.org>
wrote:

> On Wed, Aug 2, 2017 at 11:07 AM, Chris Pick <rr at chrispick.com> wrote:
>
>> On Tue, Aug 1, 2017 at 6:24 PM Robert O'Callahan <robert at ocallahan.org>
>> wrote:
>>
>> On Wed, Aug 2, 2017 at 9:37 AM, Chris Pick <rr at chrispick.com> wrote:
>>>
>>>> I was evangelizing rr to a friend last night
>>>>
>>>
>>> Thank you! :-)
>>>
>>>
>>>> who's been debugging a program that uses kernel bypass for networking.
>>>>
>>>> I assumed that 1) userspace uses DMA to interact with the networking
>>>> hardware and like shared memory, DMA isn't supported by rr.  If those are
>>>> true and, further assuming 3) all the DMA/magic is hidden behind a pair of
>>>> send_pkt() and recv_pkt() functions,
>>>>
>>>
>>> How true is assumption #3, do you know? I don't know anything about
>>> these interfaces.
>>>
>>>
>> Good question, I don't have any hands-on experience with his system
>> myself.  My first goal was to partially sketch out a path for my friend to
>> investigate.  Now I'm thinking through this hypothetical out of curiosity.
>>
>> would it be possible to have rr treat that pair as a set of custom
>>>> pseudo-syscalls, recording their inputs and outputs for later replay?
>>>>
>>> I imagine something similar must be done if rr supports
>>>> recording/replaying vDSO functions?
>>>>
>>>
>>> We do that by patching the vDSO entry points to perform the equivalent
>>> regular syscall. That is pretty simple to do.
>>>
>>> I think that what you're suggesting could be implemented, but it would
>>> be significant work. One way to do it would be to read symbols during
>>> recording to locate the recv_pkt() function, and patch its exit with an
>>> rr-specific system call which takes the packet buffer address and size as a
>>> parameter. This would basically behave the same as a read syscall ---
>>> logging the output buffer during recording, and storing it back there
>>> during replay. Then one would add support for that syscall to librrpreload
>>> to get a non-kernel fast path.
>>>
>>> If you could manually patch the recv_pkt() function or a wrapper around
>>> it, that would make this a lot easier.
>>>
>>> This was along the lines of what I was thinking.  It's definitely the
>> case that either a) the send/recv_pkt() functions are in a shared object
>> and can themselves be intercepted via preloading or b) the program could be
>> recompiled with wrapper functions around them.
>>
>
> Aha, yes, using preloading to override a shared library function would be
> easy and we do have a precedent for that. (See src/preload/overrides.c.)
>
>
>> It sounds like that would be simple to do (I'm imagining it's a matter of
>> logging a pseudo-syscall number with the buffer/packet contents a-la
>> read(2)/write(2)?  Is there any precedent for this and/or does it sound
>> like a reasonable thing to do?
>>
>
> It sounds reasonable.
>
> The magic system call would have to be implemented but that's fairly easy
> to do.
>
>
>> Of course that approach would only work if the program does not have data
>>> races involving the DMA buffer. If it does, rr might not produce the
>>> correct execution during replay.
>>>
>>> Since I'm assuming the send/recv_pkt() functions are taking/returning
>> discrete packets and completely contain all the DMA messiness and that we
>> can wrap them and record their inputs/outputs I don't expect any of the
>> rest of the code to be able to race with them.
>>
>
> Sure, as long as that's not the bug :-).
>
> Rob
> --
> lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf
> toD
> selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t
> rdsme,aoreseoouoto
> o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea
> lurpr
> .a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt
> sstvr  esn
>



-- 
lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf
toD
selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t
rdsme,aoreseoouoto
o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea
lurpr
.a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr
esn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rr-dev/attachments/20170802/4627a4bb/attachment.html>


More information about the rr-dev mailing list