I’m pretty sure it’s a stability issue. With small steps the noise is correlated...

I’m pretty sure it’s a stability issue. With small steps the noise is correlated between steps; if you tried it in one big jump then you would essentially just memorize the input data. The maximum noise would act as a “key” and the model would memorize the corresponding image as the “value”. But if we do it as a bunch of little steps then the nearby steps are correlated and in the training set you’ll find lots of groups of noise that are similar which allows the model to generalize instead of memorizing.