> In this case I don't understand how the state leaked from the file-system filter driver to our process, as it seems to have done.
I assume you made some Windows API call somewhere, which ended up in the filesystem filter driver, which then clobbered the register. And I'm guessing the NT kernel code and Windows DLLs never save/restore the register, because it isn't supposed to be clobbered.
Couldn't one approach be to wrap all Windows API calls with some extra code which saves and restores all the callee-save registers, so even if a buggy kernel driver clobbers them, you don't get hurt by that? I don't know, maybe that's too expensive.
Instead of restoring, one could check for the clobbering, and crash the process immediately. Or maybe Microsoft should add such a wrapping to all calls to third party kernel drivers, and blue screen?
Those tactics could work. They would be a bit expensive however, and it would be a shame to have all users paying the performance penalty because of a few bad pieces of software. And, this would not have caught the two errors in the assembly language code within Chrome - that would require testing at every function call, not just Windows API calls.
It would be more practical (I think) to do this checking on a special build of Chrome that is shipped to a small percentage of users, so that not everybody pays the price.
But, this is an ecosystem problem and I'm not sure Chrome wants to shoulder the entire burden of finding bad software :-)
It would be interesting to see measurements of how big the expense is.
Also, I don't think one necessarily has to do it for every Windows API call – some Windows API calls are more likely to invoke third-party code than others; some API calls are far more performance-critical than others. Maybe one could find a subset of calls to focus on which maximise the likelihood of invoking third-party code but also minimise the performance impact.
> And, this would not have caught the two errors in the assembly language code within Chrome - that would require testing at every function call, not just Windows API calls
For code you control, I think some kind of static analysis would be a better approach – parse inline assembly code and check that every register it touches is marked as clobbered to the compiler. I saw some other comments you were replying to already on that topic. I think this kind of "dynamic" approach should be reserved for third-party code with low trustworthiness.
> It would be more practical (I think) to do this checking on a special build of Chrome that is shipped to a small percentage of users, so that not everybody pays the price.
I was thinking, you could also do it using API hooking. Have some hidden setting to control it, by default off. If it is off, no impact, same as now. If the flag is on, hook (some subset of) Windows APIs with the "unexpected-register-clobber-detector". That way you don't have to produce two completely different builds.
And maybe even, automatically turn that flag on if an install starts to experience crashes–especially if the presence of certain kinds of third-party software is detected.
> But, this is an ecosystem problem and I'm not sure Chrome wants to shoulder the entire burden of finding bad software :-)
Agree. Ideally, Microsoft would take the lead there, since it is their platform. But a world in which the Chrome team does it would be better than a world in which nobody does.
I assume you made some Windows API call somewhere, which ended up in the filesystem filter driver, which then clobbered the register. And I'm guessing the NT kernel code and Windows DLLs never save/restore the register, because it isn't supposed to be clobbered.
Couldn't one approach be to wrap all Windows API calls with some extra code which saves and restores all the callee-save registers, so even if a buggy kernel driver clobbers them, you don't get hurt by that? I don't know, maybe that's too expensive.
Instead of restoring, one could check for the clobbering, and crash the process immediately. Or maybe Microsoft should add such a wrapping to all calls to third party kernel drivers, and blue screen?