Introducing GigaGAN: New Framework Challenging Diffusion Models
A new GAN architecture challenges DALL-E, Midjourney and Stable Diffusion

It happened almost overnight: diffusion models became the gold standard of AI image generation with the advent of Midjourney and DALL-E.
Until then, so-called GAN models (short for Generative Adversarial Network) were the way to go for AI image generation, but diffusion models were too powerful and took over.
Now GANs are back.
But why? And how did the new GigaGAN architecture outperform diffusion models in key benchmarks?
Return of The GAN
With GigaGAN a brand-new architecture challenges the position of diffusion models by being not only way faster but also capable of higher resolution outputs (without increasing the generation time too much).
GigaGAN offers a powerful alternative to diffusion-based image generation:
- using 1 billion parameters and
- generating 512px images at 0.13 seconds (!)
- generating 16-megapixel images in 3.66 seconds
In addition to this, the research team around GigaGAN also build an equally fast upsampler that outputs 4k images from low-resolution inputs.

GigaGAN also allows smooth interpolation between prompts, as shown in the interpolation grid below. The four corners are generated with different text prompts.

The GigaGAN framework also enables “disentangled prompt mixing” and “coarse-to-fine style swapping”:


Link to the original paper: https://arxiv.org/abs/2303.05511
Project page: https://medium.com/r/?url=https%3A%2F%2Fmingukkang.github.io%2FGigaGAN%2F
➡️ If you like my content, why not leave a “clap” at the end of this article, so more people can see it?






