avatarJim Clyde Monge

Summary

Stable Diffusion is a new, free text-to-image AI tool that competes with OpenAI's Dall-E2, offering similar capabilities but with fewer restrictions on generating images of public figures and without a cost.

Abstract

Stable Diffusion emerges as a significant competitor in the text-to-image AI generator space, providing a service similar to OpenAI's Dall-E2. Unlike Dall-E2, it does not restrict the creation of images depicting well-known individuals and is available at no cost. This model, developed by Stability AI and trained on a large cluster of Nvidia A100 GPUs, uses a diffusion process to transform noise into detailed images guided by text prompts. Early access to Stable Diffusion can be obtained through Stability AI's official website and Discord server. The tool has been praised for its rapid image generation, attention to detail, and ability to create realistic portraits and celebrity faces, potentially revolutionizing content creation and meme culture. However, concerns have been raised about the lack of strict policies against deep fakes and the generation of offensive content.

Opinions

  • The author is impressed by Stable Diffusion's ability to quickly generate detailed and symmetrical images, including portraits and celebrity faces.
  • Stable Diffusion is seen as more permissive than its competitors, which could be beneficial for creativity but also raises concerns about the potential for misuse in creating deep fakes.
  • There is an appreciation for the tool's potential to transform the meme landscape and content creation in general.
  • The author has mixed feelings about Stable Diffusion, acknowledging its impressive capabilities while being cautious about the lack of safeguards against the creation of offensive or harmful images.
  • A hope is expressed that the developers will implement significant safety measures to mitigate potential harms before the service's wider release

Stable Diffusion: New And FREE Text-To-Image AI Tool

Image by Jim Clyde Monge. Designed with Canva

In the world of text-to-image AI generator models, OpenAI’s Dall-E2 is an obvious pick for the best tool that’s currently available.

It does, however, have a significant artificial limitation: it is unable to produce images of well-known individuals, including politicians and celebrities. Additionally, using the service has a price tag attached to it.

There is now a completely free competitor that functions practically identically to Dall-E2 but without as many filters — Stable Diffusion.

Let’s discuss a few topics.

  • What is Stable Diffusion?
  • How does it work?
  • How to get early access?
  • My first generated images

What Is Stable Diffusion?

On Stability AI’s website, Stable Diffusion is described as a text-to-image model that will enable billions of people to produce beautiful art in a matter of seconds.

Stable Diffusion sample images

This model employs a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts, much like Google’s Imagen does. The model uses a GPU with at least 10GB VRAM and is relatively lightweight with an 860M UNet and 123M text encoder.

How Does It Work?

Stable Diffusion separates the image generating process into a “diffusion” process at runtime. Starting with only noise, it gradually improves an image until there is no noise left at all, bringing it more and closer to a provided text description.

Stable Diffusion sample images

Over the course of a month, Stability AI trained Stable Diffusion on a cluster of 4,000 Nvidia A100 GPUs operating in Amazon Web Services. Ludwig Maximilian University of Munich’s CompVis machine vision and learning research group directed the training, and Stability AI provided the computing resources.

Through its Discord server, Stability AI has made the Stable Diffusion model accessible to a select group of users.

Discord welcome channel

How To Get Early Access

Navigate to the beta sign-up portion of the official Stability.ai website.

Screenshot from Stability.ai

Complete the sign-up process and wait for the confirmation email.

Image by Jim Clyde Monge

Check your spam folder occasionally since mine went right there.

The email will contain a link that will take you to the Discord dashboard. Read the terms of service in the dashboard and follow the instructions to gain access to the Dream channels.

Once you get access to the Dream channels, input your descriptive text in the chat box. The prompt should be in this format:

Image by Jim Clyde Monge

My First Generated Images

I’m fortunate that I was approved in a short time to use the AI tool for myself. Here are a couple of the pictures I generated.

Prompt: !dream “Old viking woman with braids in gray hair wearing fur and jewelry :: very detailed, symmetric, unreal engine, rim-light” -i -S 474323078
Image by Jim Clyde Monge

I was quite aback by how rapidly the bot produced this 512x512 image. It took just five seconds.

Additionally, in comparison to other text-to-image AI generator models like Disco Diffusion or MidJourney, Stable Diffusion can create portraits pretty well. The details are spot-on, and the facial characteristics are symmetrical.

Prompt: !dream “HQ photo face picture of Shia Labeouf sitting on a throne wearing a golden crown”
Image by Jim Clyde Monge

Stable Diffusion does a pretty good job with celebrity faces too. It can even mix the faces of famous people.

Prompt: !dream “jean-claude van damme as tyrion lannister”
Image by Jim Clyde Monge

I feel like the meme game is about to undergo a revolution with this AI power.

How about animals?

Prompt: !dream “a photo of a dog studying for an exam”
Image by Jim Clyde Monge

No problem. Here are more examples.

Image by Jim Clyde Monge
Image by Jim Clyde Monge

Okay, that’s all I have for today.

Go ahead and sign-up to get access and try the tool yourself. Your imagination is the only constraint on the unlimited possibilities.

Final Thoughts

I personally have mixed thoughts about Stable Diffusion.

Although the tool produces some of the most impressive images, it appears to be more permissive than its competitors.

Stability AI doesn’t have a clear policy prohibiting pictures of famous people. That might make it simple for bad actors to execute deep fakes.

The lack of strong countermeasures also allows some users to generate offensive or lude images.

I hope the engineers behind this technology are already working on taking significant safety measures by formulating innovative tools to help mitigate potential harms before the service gets released to the public.

Artificial Intelligence
Technology
Art
Machine Learning
Stable Diffusion
Recommended from ReadMedium