Introducing GigaGAN: New Framework Challenging Diffusion Models

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

1277

Abstract

i>using 1 billion parameters and</li><li>generating 512px images at 0.13 seconds (!)</li><li>generating 16-megapixel images in 3.66 seconds</li></ul><p id="4fbe">In addition to this, the research team around GigaGAN also build an equally fast upsampler that outputs 4k images from low-resolution inputs.</p><figure id="7fee"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*MlZWuUU5W0o7pJdNy1vbIA.png"><figcaption>close up of GigaGAN upsampler, from <a href="https://mingukkang.github.io/GigaGAN/">https://mingukkang.github.io/GigaGAN/</a></figcaption></figure><p id="85f7">GigaGAN also allows smooth interpolation between prompts, as shown in the interpolation grid below. The four corners are generated with different text prompts.</p><figure id="cd22"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*HSc9JRh1v3wzZBrT.png"><figcaption>Image from <a href="https://mingukkang.github.io/GigaGAN/">https://mingukkang.github.io/GigaGAN/</a></figcaption></figure><p id="d8af">The GigaGAN framework also enables “disentangled prompt mixing” and “coarse-to-fine style swapping”:</p><figure id="8287"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*ejIwspns5cyEsHI6.png"><figcaption></figcaption></figure><figure id="08af"><img src

Options

="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*4istWPZgSqdAoB8J.jpg"><figcaption>Image from <a href="https://mingukkang.github.io/GigaGAN/">https://mingukkang.github.io/GigaGAN/</a></figcaption></figure><p id="d0a1">Link to the original paper: <a href="https://arxiv.org/abs/2303.05511">https://arxiv.org/abs/2303.05511</a></p><p id="ec59">Project page: <a href="https://mingukkang.github.io/GigaGAN/">https://medium.com/r/?url=https%3A%2F%2Fmingukkang.github.io%2FGigaGAN%2F</a></p><div id="7dab" class="link-block"> <a href="https://medium.com/@tristwolff/membership"> <div> <div> <h2>Join Medium with my referral link - Tristan Wolff</h2> <div><h3>Read every story from Tristan Wolff (and thousands of other writers on Medium). Your membership fee directly supports…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*5tX7l4Bdjz_F7KDK)"></div> </div> </div> </a> </div><p id="2ec0">➡️ If you like my content, why not leave a “clap” at the end of this article, so more people can see it?</p></article></body>

A new GAN architecture challenges DALL-E, Midjourney and Stable Diffusion

It happened almost overnight: diffusion models became the gold standard of AI image generation with the advent of Midjourney and DALL-E.

Until then, so-called GAN models (short for Generative Adversarial Network) were the way to go for AI image generation, but diffusion models were too powerful and took over.

Now GANs are back.

But why? And how did the new GigaGAN architecture outperform diffusion models in key benchmarks?

Return of The GAN

With GigaGAN a brand-new architecture challenges the position of diffusion models by being not only way faster but also capable of higher resolution outputs (without increasing the generation time too much).

GigaGAN offers a powerful alternative to diffusion-based image generation:

using 1 billion parameters and

generating 512px images at 0.13 seconds (!)

generating 16-megapixel images in 3.66 seconds

In addition to this, the research team around GigaGAN also build an equally fast upsampler that outputs 4k images from low-resolution inputs.

GigaGAN also allows smooth interpolation between prompts, as shown in the interpolation grid below. The four corners are generated with different text prompts.

The GigaGAN framework also enables “disentangled prompt mixing” and “coarse-to-fine style swapping”:

➡️ If you like my content, why not leave a “clap” at the end of this article, so more people can see it?