Summary

The website content provides a concise guide on using Hugging Face's platform to quickly and freely generate complex images with Stable Diffusion, illustrating both text-to-image and image-to-image capabilities.

Abstract

The article "Learning Stable Diffusion With Hugging Face in 5 Minutes" offers a brief tutorial on leveraging Hugging Face's user-friendly AI platform to create sophisticated images through Stable Diffusion. It highlights the ease of use, speed, and cost-free nature of the service, which doesn't require account creation or usage fees. The open-source Stable Diffusion model, supported by various organizations, is adept at generating images from text prompts and can also perform tasks like inpainting and outpainting. The article demonstrates the use of positive and negative prompts to refine image generation and explains the 'Guidance Scale' parameter that influences the AI's adherence to the prompt. Additionally, it covers the image-to-image functionality of Stable Diffusion, showcasing how existing images can be transformed based on textual descriptions while maintaining color and shape fidelity. Advanced settings such as the number of images, steps, strength, and seed are also discussed, providing insights into how these parameters affect the generated images. The conclusion emphasizes the simplicity and accessibility of Stable Diffusion through Hugging Face, positioning it as a free alternative to similar paid services like DALL·E and Midjourney.

Opinions

The author views Hugging Face as an outstanding AI community due to its comprehensive offerings, efficiency, and ease of use.
The article suggests that Stable Diffusion's ability to generate detailed images from text descriptions is a significant feature, enhancing the model's versatility.
The negative prompt feature is presented as a valuable tool for fine-tuning the AI's output, allowing users to exclude unwanted elements from the generated images.
The 'Guidance Scale' parameter is considered crucial for controlling the balance between strict adherence to the prompt and creative freedom in the AI-generated images.
The image-to-image capabilities of Stable Diffusion are highly regarded for their ability to maintain the integrity of the original image while incorporating new elements based on textual input.
The author appreciates the depth of customization available through advanced settings, which provide users with greater control over the image generation process.
The article concludes with a positive opinion of Stable Diffusion on Hugging Face, highlighting it as a superior choice for users seeking a simple, fast, and free solution for creating advanced AI-generated images.

Learning Stable Diffusion With Hugging Face in 5 Minutes

A simple way to build fancy images, fast and free.

Image by author, at AI Hardware Summit and Edge AI Summit 2022

Introduction

Hugging Face is an open-source and platform provider of machine learning technologies. Hugging Face was launched in 2016 and is headquartered in New York City. We visited its booth at AI Hardware Summit and Edge AI Summit 2022. It is an amazing AI community that builds, trains, and deploys state of the art models powered by referencing open source in machine learning. It transforms complicated machine learning models into simple applications.

Why is Hugging Face outstanding?

It is a one-stop shop for many AI products.
It executes faster than using local computing resources.
There is no need to set up working environment.
There is no need to create an account.
There is no need to purchase executing tokens (use fee).

Let’s use Stable Diffusion as an example to build fancy images, fast and free. The Stable Diffusion model was released by a collaboration of Stability AI, CompVis LMU, and Runway with support from EleutherAI and Large-scale Artificial Intelligence Open Network (LAION).

Stable Diffusion is open source. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt.

Stable Diffusion (text-to-image)

Stable Diffusion is a deep learning, text-to-image model. We use the prompt, Chinese new year 2023 using ink painting, and the following images are generated:

In the app, there is an optional field for negative prompt, which has the additional capability to tell the stable diffusion model what we do not want to see in generated images. This feature can be used to remove anything from the final images.

We add the negative prompt, flower, and the generated images do not have flowers:

In advanced settings, there is a parameter, Guidance Scale, which controls how closely Stable Diffusion will follow the prompt when generating images. A higher value will force the AI to be more strict and follow the prompt closely, while a lower value will give the AI more creative freedom.

The default value of Guidance Scale is 9. Using extremely high values like 16–20 may result in image frying and other artifacts. On the other hand, using extremely low values like 0–4 may result in barely any adherence to the prompt.

Here is an example with Guidance Scale set to 20.9:

Stable Diffusion (image-to-image)

Stable Diffusion 2 Depth2Img is a deep learning, image-to-image model. The image generation is based on both the image and the prompt, and the final images resemble the input image in color and shapes.

We set the advanced options to 4 images. Use an existing image and the prompt, A plushie lies on beach, to create 4 images :

Add a negative prompt, sky, and one of the generated images do not show sky.

Add two negative prompts: sky, cloud, and two of the generated images do not show sky.

The app also has an optional depth image, which is a simple gray scale image of the same size of the original image encoding the depth information. Complete white means the object is closest to the viewer, and more black means further away.

Here is an example with the depth image provided by the app:

There are a number of advanced settings:

Images: The number of images to be generated. The default value is 1, and the maximum number is 4.
Steps: It controls the number of iterations of noise removal that Stable Diffusion will perform. The more steps there are, the better the result will be, but only up to a certain point. In most cases, images will converge on 30 steps and will not change significantly on higher steps. The default value is 50.
Guidance Scale: It controls how closely Stable Diffusion will follow the prompt when generating images. A higher value will force the AI to be more strict and follow the prompt closely, while a lower value will give the AI more creative freedom. The default value is 9.
Strength: It controls the amount of noise that is added to the input image. It is a value between 0.0 and 1.0, where values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. The default value is 0.9.
Seed: It is responsible for creating the initial noise that is used to generate the image. Different seeds will produce different images, but using the same seed will always produce the same image, even if you run the generation process multiple times.

From our original image, it generates various plushies each time. Set the seed to 1, and the generated images remain same for every run.

Change strength to 0.5, we can see the generated images more resemble the original image.

Conclusion

We have explored Stable Diffusion using Hugging Face. It is a simple way to build fancy images, fast and free. Stable Diffusion is open source, and it has the capabilities of text-to-image and image-to-image.

While DALL·E and Midjourney have the similar capabilities, they are not free. Stable Diffusion on Hugging Face can be executed immediately, without being logged in.

Thanks for reading.

Want to Connect?

If you are interested, check out my directory of web development articles.

More content at PlainEnglish.io.

Build awareness and adoption for your tech startup with Circuit.