A/B testing is founded on statistics. You take Option A and Option B and see which one achieves more Goal C.
But you can't just look at the percent difference and decide that Option B must be better! Look, it has a higher percent Goal C! But that could be due to chance, so A/B tests employ tests of statistical significance to determine whether the test results are _probably_ chance or _probably_ reflect a genuine causal increase in Goal C.
For example, if you flip a coin four times, and get heads three of those times, without a statistical significance test you might conclude heads is 3x as likely to appear as tails. We know that's wrong, though- each side on a coin has a 50% chance of appearing face-up for each flip.
The flaw in this experiment is that we tried to extrapolate a result from a very small set of data. A statistical significance test would take these results and say "we have a <very small percent> confidence level that heads is more likely to come up, and doesn't just come up more often by chance".
If we flipped the coin 10,000 times instead, you'd get something pretty close to 50% heads and 50% tails, and your significance test would return a high confidence level that those numbers are accurate.
Short story long, you need lots of datapoints to determine whether an A/B test result is chance or an actual difference, and the smaller the difference between how Option A and Option B perform, the more datapoints you need to be confident they're actually different. Patrick's numbers are so close together that he'd need far more than 300 sales to reach the gold-standard 95% confidence level that there's actually a difference.
A/B testing is founded on statistics. You take Option A and Option B and see which one achieves more Goal C.
But you can't just look at the percent difference and decide that Option B must be better! Look, it has a higher percent Goal C! But that could be due to chance, so A/B tests employ tests of statistical significance to determine whether the test results are _probably_ chance or _probably_ reflect a genuine causal increase in Goal C.
For example, if you flip a coin four times, and get heads three of those times, without a statistical significance test you might conclude heads is 3x as likely to appear as tails. We know that's wrong, though- each side on a coin has a 50% chance of appearing face-up for each flip.
The flaw in this experiment is that we tried to extrapolate a result from a very small set of data. A statistical significance test would take these results and say "we have a <very small percent> confidence level that heads is more likely to come up, and doesn't just come up more often by chance".
If we flipped the coin 10,000 times instead, you'd get something pretty close to 50% heads and 50% tails, and your significance test would return a high confidence level that those numbers are accurate.
Short story long, you need lots of datapoints to determine whether an A/B test result is chance or an actual difference, and the smaller the difference between how Option A and Option B perform, the more datapoints you need to be confident they're actually different. Patrick's numbers are so close together that he'd need far more than 300 sales to reach the gold-standard 95% confidence level that there's actually a difference.