I'm having a difficult time imagining a situation where people's actual productivity using a piece of software can be so easily measured. I'm sure it happens, but I think it's safe to say this is the exception to the rule when it comes to A/B testing