Stable Diffusion 2.0 Released — This Is Massive

Stability AI dropped the second version of its widely popular and open-source image generator, Stable Diffusion. Compared to the first model, version 2.0 has a lot of big improvements and new features.
Summary
Stability AI has released Stable Diffusion 2.0, an enhanced version of their open-source image generation model, featuring significant improvements such as a new text encoder, upscaler, depth recognition, and text-guided inpainting capabilities.
Abstract
Stable Diffusion 2.0, the latest iteration of Stability AI's image generation tool, has been introduced with substantial advancements. The update includes a new text encoder developed by LAION, which has been trained with a vast dataset of image-text pairs to produce high-definition images. Additionally, the model now incorporates an upscaler that can generate images with resolutions up to 2048x2048, a depth-guided feature for image-to-image transformations using MiDaS technology, and a text-guided inpainting model for modifying specific parts of an image through natural language descriptions. The project remains open-source, with the code available on GitHub, and users can interact with the technology through a demo application on HuggingFace or DreamStudio. The release underscores the collaborative potential of open-source development, providing access to cutting-edge AI tools for a broader community.
Opinions

Stability AI dropped the second version of its widely popular and open-source image generator, Stable Diffusion. Compared to the first model, version 2.0 has a lot of big improvements and new features.
Let’s dive in and take a look at each one of them.
The new diffusion model is trained from scratch with 5.85 billion CLIP-filtered image-text pairs.
The result is a stunning high-definition image like this.

Stable Diffusion 2.0-v is a so-called v-prediction model. Further filtration is performed to remove adult content using LAION’s NSFW filter.
Stable Diffusion 2.0 can now generate results with resolutions of 2048x2048 or more.

You can download the upscaler from here and run it on the Gradio or Streamlit demos.
This feature is what I am most curious about.
SD 2.0 can now make depth estimates for the image-to-image feature using MiDaS (Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-Dataset Transfer).
Take a look at this example:


This is absolutely incredible.
SD 2.0 now supports text-guided inpainting. It means you can simply describe in natural language what parts of the image you want to modify.

The project is still open source. You can download or fork the project from GitHub.
The demo application is accessible via the HuggingFace app => https://huggingface.co/spaces/stabilityai/stable-diffusion
Unfortunately, there are way too many users using the app right now, so I cannot provide sample images. I’ll update this article once the web app becomes accessible.
The new version will also be available in DreamStudio in the coming days.
If you’re interested in accessing the service via API, you can check out the documentation here.
Overall, I am in awe of the people behind this technology. Many thought we were going closed-source, but here we are. Let me end with this quote from Stability AI.
This is the power of open source: tapping the vast potential of millions of talented people who might not have the resources to train a state-of-the-art model, but who have the ability to do something incredible with one.
Read the full announcement here => https://stability.ai/blog/stable-diffusion-v2-release
Levent BulusanFLUX.1 is a powerful and exciting new addition to the AI image generation landscape. Fast, open-source and did I say “open-source”?
Henrique Centieiro & Bee LeeRecap of MidJourney’s Weekly Office Hours (Sep 11, 2024)
Marco RodriguesGenerate realistic AI images of your loved ones by training a Flux LoRa model with only a few photos.