All the bus/memory timings are the same with this guy's hack. So it's not a big surprise that 10 print "hello world!" :print "!" : goto 10 doesn't have any perceived performance difference. Maybe if you made a purely arithmetic benchmark in assembly that could be done all within a few registers, like Fibonacci, you might see some difference.
The first answer on this forum post has what is probably the correct explanation for why the 8088 was chosen:
It was available, had a second source, and was not owned by a competitor.
https://retrocomputing.stackexchange.com/questions/16912/did...
Although ease-of-translation from 8080-based CP/M code was a benefit of choosing the 8086, this was just a nice-to-have, not a deciding factor.