rr, threads, and signal handlers
wadehennessey at gmail.com
Wed Nov 4 05:00:29 UTC 2015
I wrote a small standalone testcase today to try to reproduce the hanging
problem I described yesterday, and the test case wouldn't hang. I then
realized that sometimes my program would hang the way it did when run under
"rr record", but not that often. So I used "rr replay" to watch what was
happening with "watch -l" set on a critical variable, and I figured out
what was wrong.
So this was my problem, not an rr problem. But most importantly, rr helped
me figure out a race condition that has eluded me for quite awhile! This is
exactly why I started trying to use rr yesterday. I'm writing this to offer
my thanks and congratulations to everyone who has worked on this awesome
new way to debug. I'm writing a real-time garbage collector, and these
kinds of bugs have been very hard to debug without something like rr. This
is a game changer for me. -wade
On Mon, Nov 2, 2015 at 11:02 PM, Robert O'Callahan <robert at ocallahan.org>
> On Tue, Nov 3, 2015 at 4:39 PM, Wade Hennessey <wadehennessey at gmail.com>
>> I don't know if this list is the right place to ask a question about rr,
>> pthreads and signal handlers, but it's the only place I've found.
>> I have a program with 2 threads (but it could be an arbitrary number).
>> Thread one at some point sends a signal to thread 2, and thread 2 has a
>> signal handler that gets called. Thread 2 does a small amount of work and
>> then busy waits for thread 1 to change the state of a global variable to
>> indicate that it's ok to return from the signal handler.
>> This program works fine without rr, but when I run try to rr record it,
>> it hangs, seemingly because thread 1 never gets a chance to run and tell
>> the signal handler on thread 2 that it's ok to return. Does rr require the
>> signal handler on thread 2 to return before allowing thread 1 to run
>> because of an rr limitation (i.e - everything runs on one core and signal
>> handlers must return before any other thread can execute?).
> rr should allow a context switch in that case. I'm not sure why you're
> Inserting a sched_yield while busy-waiting should make the problem go
> away. Does it?
> Does the problem show up if you write a small standalone testcase with the
> behavior you described?
> lbir ye,ea yer.tnietoehr rdn rdsme,anea lurpr edna e hnysnenh hhe uresyf
> selthor stor edna siewaoeodm or v sstvr esBa kbvted,t
> o l euetiuruewFa kbn e hnystoivateweh uresyf tulsa rehr rdm or rnea
> .a war hsrer holsa rodvted,t nenh hneireseoouot.tniesiewaoeivatewt sstvr
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the rr-dev