Open AI have both said it's native image generation *and* autoregressive. It has...

Open AI have both said it's native image generation and autoregressive. It has the signs of it too.

It's probably an implementation of VAR (https://arxiv.org/abs/2404.02905) - autoregressive image generation with a small twist. Rather than predict every token at the target resolution directly, start with predicting it at a small resolution, cranking it higher and higher until the desired resolution.