Summary

Stability AI has released Stable Diffusion 2.0, an enhanced version of their open-source image generation model, featuring significant improvements such as a new text encoder, upscaler, depth recognition, and text-guided inpainting capabilities.

Abstract

Stable Diffusion 2.0, the latest iteration of Stability AI's image generation tool, has been introduced with substantial advancements. The update includes a new text encoder developed by LAION, which has been trained with a vast dataset of image-text pairs to produce high-definition images. Additionally, the model now incorporates an upscaler that can generate images with resolutions up to 2048x2048, a depth-guided feature for image-to-image transformations using MiDaS technology, and a text-guided inpainting model for modifying specific parts of an image through natural language descriptions. The project remains open-source, with the code available on GitHub, and users can interact with the technology through a demo application on HuggingFace or DreamStudio. The release underscores the collaborative potential of open-source development, providing access to cutting-edge AI tools for a broader community.

Opinions

The author expresses excitement about the depth recognition capabilities of SD 2.0, considering it the most intriguing feature.
There is a sense of awe and admiration for the team behind Stable Diffusion 2.0, emphasizing their commitment to open-source principles.
The author notes the high demand for the demo application, indicating widespread interest and engagement with the new version.
The release is seen as a significant milestone, countering expectations that the technology might become closed-source.
The quote from Stability AI reflects a belief in the power of open-source collaboration to enable millions to contribute and innovate with state-of

Stable Diffusion 2.0 Released — This Is Massive

Stability AI dropped the second version of its widely popular and open-source image generator, Stable Diffusion. Compared to the first model, version 2.0 has a lot of big improvements and new features.

What’s New?

Brand new text encoder (OpenCLIP), developed by LAION
Upscaler Diffusion model that enhances the resolution of images by a factor of 4
Brand new depth-guided stable diffusion model
Brand new text-guided inpainting model

Let’s dive in and take a look at each one of them.

New Text Encoder

The new diffusion model is trained from scratch with 5.85 billion CLIP-filtered image-text pairs.

The result is a stunning high-definition image like this.

Stable Diffusion 2.0-v is a so-called v-prediction model. Further filtration is performed to remove adult content using LAION’s NSFW filter.

New Upscaler

Stable Diffusion 2.0 can now generate results with resolutions of 2048x2048 or more.

You can download the upscaler from here and run it on the Gradio or Streamlit demos.

Depth Recognition

This feature is what I am most curious about.

SD 2.0 can now make depth estimates for the image-to-image feature using MiDaS (Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-Dataset Transfer).

Take a look at this example:

This is absolutely incredible.

Text Guided Inpainting Model

SD 2.0 now supports text-guided inpainting. It means you can simply describe in natural language what parts of the image you want to modify.

The project is still open source. You can download or fork the project from GitHub.

Try It Yourself

The demo application is accessible via the HuggingFace app => https://huggingface.co/spaces/stabilityai/stable-diffusion

Unfortunately, there are way too many users using the app right now, so I cannot provide sample images. I’ll update this article once the web app becomes accessible.

The new version will also be available in DreamStudio in the coming days.

If you’re interested in accessing the service via API, you can check out the documentation here.

Overall, I am in awe of the people behind this technology. Many thought we were going closed-source, but here we are. Let me end with this quote from Stability AI.

This is the power of open source: tapping the vast potential of millions of talented people who might not have the resources to train a state-of-the-art model, but who have the ability to do something incredible with one.

Read the full announcement here => https://stability.ai/blog/stable-diffusion-v2-release

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com