rr supports multithreaded applications, but only uses a single core. So parallel...

scott_s · on Nov 6, 2018

Being slower is not, I think, the major downside. It is that an entire class of errors - race conditions - are basically outside of the scope of the tool. Which is understandable! Race conditions are hard, and when I read about the tool, my first thought was "How are they handling race conditions?" and it turns out, essentially, they're not. But race conditions are also the hardest part about debugging multithreaded applications.

I'm not sure if the tool ensures deterministic scheduling of threads on the single core, but I doubt that it does. If it does not, then playbacks will not be deterministic on playback, which means you could encounter different race condition outcomes on playback. If it does, then while you may have deterministic playback, the tool is unlikely to help with the class of race conditions that require simultaneous execution.

To be clear: I'm not criticizing the tool or the work of the people. If I were to design such a tool, I would probably start with a single core as well. It seems like a valuable tool and great progress for software debugging. But I do think race conditions in multithreaded programs are a current limitation.

edit: The technical report says that they deterministically schedule threads (https://arxiv.org/pdf/1705.05937.pdf):

"RR preemptively schedules these threads, so context switch timing is nondeterminism that must be recorded. Data race bugs can still be observed if a context switch occurs at the right point in the execution (though bugs due to weak memory models cannot be observed)."

The "weak memory model" part means it won't help with, say, debugging lock-free algorithms where you screw up the semantics.

roca · on Nov 6, 2018

You should read https://arxiv.org/abs/1705.05937 so you don't need to speculate. rr absolutely does guarantee that threads are scheduled the same way during replay as during recording, otherwise it wouldn't work at all on applications like Firefox which use a lot of threads.

Also, rr definitely is very useful for debugging race conditions. For example Mozilla developers have debugged lots of race conditions using it. One thing that really helps is rr's "chaos mode", which randomizes thread scheduling in an intelligent way to discover possible races. See https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo... and https://robert.ocallahan.org/2016/02/deeper-into-chaos.html and https://robert.ocallahan.org/2018/05/rr-chaos-mode-improveme....

scott_s · on Nov 6, 2018

Very cool stuff! And yes, I took a look at the paper, as I noted in my edit. But I think there's still two classes of race conditions outside of its scope: ones that require simultaneous execution (where you can get surprising interleavings) and lock-free algorithms where correct use of the memory model is paramount. In my personal experience, these are the hardest problems to debug.

codehog · on Nov 6, 2018

Even those are probably not 100% outside of its scope. I forget the details of chaos mode, but that kind of induced thread-switching can cause just the kind of interleaving you seem to be talking about.

What rr cannot capture is a very small subclass of race conditions involving things like cache line misses - I think that's what you're alluding to by "correct use of the memory model is paramount" but it's a subclass even of those. Yes, those are hugely difficult to diagnose and it would be fantastic if tools like rr or UndoDB could capture them. But there's a vast swathe of also very difficult race conditions that this recording tech can and does help with today.