> But the compiler is like "hey! you used the result of this function as an index for this array! i must be in the range [0, 10)! I can use that information!"
As a developer who has seen lots of developers (including himself) make really dumb mistakes, this seems like a very strange statement.
Imagine if you hired a security guard to stand outside your house. One day, he sees you leave the house and forget to lock the door. So he reasons, "Oh, nothing important inside the house today -- guess I can take the day off", and walks off.
That's what a lot of these "I can infer X must be true" reasonings sounds like to me: they assume that developers don't make mistakes; and that all unwanted behavior is exactly the same.
So suppose we have code that does this:
int array[10];
int i = some_function();
/* Lots of stuff */
if ( i > 10 ) {
return -EINVAL;
}
array[i] = newval;
And then someone decides to add some optional debug logging, and forgets that `i` hasn't been sanitized yet:
int array[10];
int i = some_function();
logf("old value: %d\n", array[i]);
/* Lots of stuff */
if ( i > 10 ) {
return -EINVAL;
}
array[i] = newval;
Now reading `array[i]` if `i` > 10 is certainly UB; but in a lot of cases, it will be harmless; and in the worst case it will crash with a segfault.
But suppose a clever compiler says, "We've accessed array[i], so I can infer that i < 10, and get rid of the check entirely!" Now we've changed an out-of-bounds read into an out-of-bounds write, which has changed worst-case a DoS into a privilege escalation!
I don't know whether anything like this has ever happened, but 1) it's certainly the kind of thing allowed by the spec, 2) it makes C a much more dangerous language to deal with.
Per https://lwn.net/Articles/575563/, Debian at one point found that 40% of the C/C++ programs that they have are vulnerable to known categories of undefined behavior like this which can open up a variety of security holes.
This has been accepted as what to expect from C. All compiler authors think it is OK. People who are aware of the problem are overwhelmed at the size of it and there is no chance of fixing it any time soon.
The fact that this has become to be seen as normal and OK, is an example of Normalization of Deviance. See http://lmcontheline.blogspot.com/2013/01/the-normalization-o... for a description of what I mean. And deviance will continue to be normalized right until someone writes an automated program that walks through projects, finds the surprising undefined behavior, and tries to come up with exploits. After project after project gets security holes, perhaps the C language committee will realize that this really ISN'T okay.
And the people who already migrated to Rust will be laughing their asses off in the corner.
> in a lot of cases, it will be harmless; and in the worst case it will crash with a segfault.
I am not sure if a segfault is always the worst case. It could be by some coincidence that array[i] contains some confidential information [maybe part of a private key? 32 bits of the user's password?] and you've now written it to a log file.
I know it's hard to imagine a mis-read of ~32 bits would have bad consequences of that sort, but it's not out of the question.
Depends a lot on the specifics. For example heartbleed was a misread that led to the buffer being sent on the socket. And I think it was more than 32 bits. 32 bits of garbage into a log file that needs privileges to read sounds a tad less scary, but like I say, not out of the question to be harmful.
> Depends a lot on the specifics. For example heartbleed was a misread that led to the buffer being sent on the socket. And I think it was more than 32 bits. 32 bits of garbage into a log file that needs privileges to read sounds a tad less scary, but like I say, not out of the question to be harmful.
If you can do it a lot of times, though, that changes matters.
As a developer who has seen lots of developers (including himself) make really dumb mistakes, this seems like a very strange statement.
Imagine if you hired a security guard to stand outside your house. One day, he sees you leave the house and forget to lock the door. So he reasons, "Oh, nothing important inside the house today -- guess I can take the day off", and walks off. That's what a lot of these "I can infer X must be true" reasonings sounds like to me: they assume that developers don't make mistakes; and that all unwanted behavior is exactly the same.
So suppose we have code that does this:
And then someone decides to add some optional debug logging, and forgets that `i` hasn't been sanitized yet: Now reading `array[i]` if `i` > 10 is certainly UB; but in a lot of cases, it will be harmless; and in the worst case it will crash with a segfault.But suppose a clever compiler says, "We've accessed array[i], so I can infer that i < 10, and get rid of the check entirely!" Now we've changed an out-of-bounds read into an out-of-bounds write, which has changed worst-case a DoS into a privilege escalation!
I don't know whether anything like this has ever happened, but 1) it's certainly the kind of thing allowed by the spec, 2) it makes C a much more dangerous language to deal with.