avatarTom Tillo

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5052

Abstract

the extensions tab,</p><p id="5d22">[2] go to the “Install from URL” option,</p><p id="e4b8">[3] Enter the URL for the git repo of this extension (<code>h<a href="https://github.com/Mikubill/sd-webui-controlnet">ttps://github.com/Mikubill/sd-webui-controlnet</a></code> )</p><p id="020d">[4] Click Install .. and Voila ! you are done with the setup !</p><figure id="46a6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*B8L2obvDANRURvQ1l32oLw.png"><figcaption>Installing the ControlNet extension for AUTOMATIC1111 WebUI</figcaption></figure><p id="1e29">2. Copy the pre-trained model into your local machine and place it in <code><b><AUTOMATIC1111 folder>/models/ControlNet</b></code></p><p id="fb2f">You can get the pre-trained models from here — <a href="https://huggingface.co/lllyasviel/ControlNet/tree/main/models"><code>https://huggingface.co/lllyasviel/ControlNet/tree/main/mod</code>els</a></p><p id="7f8e">or a <b>trimmed </b>version from here</p><p id="f9a2"><a href="https://huggingface.co/webui/ControlNet-modules-safetensors/tree/main"><code>https://huggingface.co/webui/ControlNet-modules-safetensors/tree/m</code>ain</a></p><p id="0cfe">3. Open your AUTOMATIC1111 WebUI, and go to either <code>txt2img</code> or the <code>img2img </code>tabs . Lets use the <code>img2img </code>option . ( you can do almost the exact with <code>txt2img </code>playarea.)</p><p id="ecc8">Drop your reference image you want to emulate.</p><figure id="2f87"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*_rVdFtJXDCFdTjqOrPAtQA.jpeg"><figcaption></figcaption></figure><p id="8825">If you scroll down to almost at the bottom of the page, you will see an extra menu for ControlNet, looking like this —</p><figure id="fb67"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*qR48G_Y4te3oOt0LqgzqxQ.png"><figcaption>ControlNet Menu option in AUTOMATIC1111 WebUI</figcaption></figure><p id="8b40">Expand the dropdown and there you can find some options. To just get started,</p><p id="380e">[1]drop the control image into the image box area. This image is the one where you want your final output should take the pose/shape/style from. ( In this example we are using the same image )</p><p id="91c0">[2] Enable the option ( check it on )</p><p id="5ad6">[3] Select the pre-processor ( Here we are choosing canny )</p><p id="bbcc">[4] Select the corresponding model for the pre-processor</p><figure id="3bc1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*llUiEHH1UuS-zIXgE-O3Xw.png"><figcaption></figcaption></figure><p id="090b"><b>Now you are done and ready to go ! Press ‘Generate’</b></p><figure id="1c2b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*a9ppKnycmcDfMP4iZEGkTA.png"><figcaption></figcaption></figure><h1 id="29cb">Some Internals — A quick look into how it works internally</h1><p id="834a"><b>ControlNets</b> rely on some basic network models to generate an intermediate control image ( control map ) which are inspired by some of the Computer vision algorithms like edge detection, depth estimation etc. The Diffusion model then uses these control images to generate the final output image.</p><h2 id="bdf4">Pre-processing algorithms</h2><figure id="2275"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*K739HXkYp9o7RQbGV7EeDA.png"><figcaption>Intermediate pre-processing step for generating input image for ControlNets. (Input Image credit — twitter account — artistfuly ) , other images generated by Stable Diffusion</figcaption></figure><p id="29d9">Lets go through a few of the intermediate pre-processing algorithms :</p><h2 id="b43d">1. Canny ( simply the old OpenCV canny edge detection algorithm )</h2><figure id="15f7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*VU3BoD7EDnejFaiUqLlYvw.png"><figcaption>Original Input Image credit — twitter account : artistfuly , other images generated by Stable Diffusion</figcaption></figure><p id="1b6e">The input image goes through a preprocessing step, which converts it into the control image ( called control map ). Here, the canny edge detection algorithm creates an intermediate image with just the boundaries for the input image entities ( after some of the usual steps like noise reduction, gradient detection, thresholding etc )</p><p id="ff5a">This method works well if you have an input image which has high contrast. Lower contrast images fail to detect the edges efficiently.</p><h2 id="c6c1">2.MLSD</h2><p id="e990">MLSD is a good pre-processor when the input image has a lot of straight lines and sharp edges. Use-cases for these are to generate images of houses, architecture rich buildings , interiors of room isometric projections of buildings, objects etc.</p><figure id="ea7f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*V9OOOkUJai6DhmDGfUCxyQ.png"><figcaption>intermediate Control map generated using MSLD pre-processing step, and final image generated using Stable Diffusion</figcaption></figure><h2 id="afe5">3. dep

Options

th</h2><p id="7bd4">When your desired output has a lot of depth variations, your choice of . Some sample use-case settings are — inside of a restaurant, a long shot view of landscape with different entities farther apart.</p><h2 id="0b08">4. open pose</h2><p id="e7f5">When you want to replicate the same posture/ pose of the individual subjects in an image, but you still want the model to be creative on the texture and surface of the subjects, opt for the <code>open pose</code>.</p><figure id="a4df"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*aD1G8U8NbJYQw-inyO7oXQ.png"><figcaption>Positions of the detected keypoints ( Image generated using ControlNet ) Image source: Stock Image, shutterstock</figcaption></figure><p id="7dc4">This is loosely based on / similar to the <code>posenet model</code> (human pose detection). See image below on what the key-points denote.</p><figure id="d8a8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*83VX-5bbJI3z2pDYk8nKTw.jpeg"><figcaption>Positions of the detected keypoints wrt to an actual image of a human body ( Image source : Tensorflow blog )</figcaption></figure><h1 id="60a6">Tips when using ControlNets</h1><p id="e431">1. Use the pre-processing algorithm/model based on what you type of image you want to generate and what your control image is. Use this quick guide table to decide which ones to use :</p><figure id="267d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*o6Wgr0YC3-gO0XZ0EXzG7Q.png"><figcaption>Image created by Author</figcaption></figure><p id="12e3">2. When using <code><b>img2img</b></code><b> </b>option, change the dimensions of the output image to that of the width-height of the original image you are uploading. If not the exact dimensions, atleast maintain the exact width:height ratio.</p><h1 id="618d">Credits / References</h1><ol><li>The original White paper for ControlNets “<a href="https://arxiv.org/abs/2302.05543"><b><i>Adding Conditional Control to Text-to-Image Diffusion Models</i></b></a><a href="https://arxiv.org/abs/2302.05543"> ( Lvmin Zhang, Maneesh Agrawala )</a></li></ol><p id="4022"><a href="https://arxiv.org/abs/2302.05543"><code>https://arxiv.org/abs/2302.05</code>543</a></p><p id="ac94">2. The original github repo for ControlNets ( <a href="https://github.com/lllyasviel/ControlNet"><code>https://github.com/lllyasviel/Control</code>Net</a> )</p><p id="a149">3. The pre-trained model ( .pth version — <a href="https://huggingface.co/lllyasviel/ControlNet/tree/main/models"><code>https://huggingface.co/lllyasviel/ControlNet/tree/main/models</code></a><code> </code>(Note : these are very bulky files almost 5 GB each )</p><p id="aad9">4. Smaller version of the pre-trained model . Yes, these work as good as the bigger files ( for higher sampling rates ) <a href="https://huggingface.co/webui/ControlNet-modules-safetensors/tree/main"><code>https://huggingface.co/webui/ControlNet-modules-safetensors/tree/m</code>ain</a></p><p id="79c8">5. Online gif making tool (<a href="https://ezgif.com/maker/ezgif-3-a0f666dc-gif"><code>https://ezgif.c</code>om/</a> )</p><h1 id="c7f1">Related Links</h1><ol><li>If you want to use an online app which lets you create your desired human poses manually , PoseMy.art is a good starting point . Here’s an article showing how to use it with ControlNet.</li></ol><div id="0aab" class="link-block"> <a href="https://readmedium.com/how-to-create-the-desired-custom-body-pose-using-mid-journey-and-posemyart-5b3895fbfe4a"> <div> <div> <h2>How to create the desired custom body pose using Mid Journey and PoseMyArt</h2> <div><h3>Quick Guide on creating flexible body pose using PoseMy.Art</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*bSAB55Jroe5pvlZCEOLXpA.png)"></div> </div> </div> </a> </div><p id="6c34">2. If you like to go base up and want to get a guide on writing text prompt for Midjourney ( also for stable diffusion ), here is the <a href="https://readmedium.com/an-advanced-guide-to-writing-prompts-for-midjourney-text-to-image-aa12a1e33b6">link </a>:</p><div id="adcc" class="link-block"> <a href="https://readmedium.com/an-advanced-guide-to-writing-prompts-for-midjourney-text-to-image-aa12a1e33b6"> <div> <div> <h2>An advanced guide to writing prompts for Midjourney ( text-to-image)</h2> <div><h3>A detailed ‘cheat sheet’ and some keywords for improving image output by using better prompts</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*61Vk6EDAFTD6j4s7X6y2NA.png)"></div> </div> </div> </a> </div></article></body>

How how to TEXT TO IMAGE , CONTROLNETS, STABLE DIFFUSION

How to create controlled poses and styles using Stable Diffusion and ControlNets

Using ControlNets with Stable Diffusion to get more control on the generated output images

What this Article is about !

Goodnews !! (for all AUTOMATIC1111 Stable diffusion UI users)

There is now a plugin/extension for the ControlNet compatible with AUTOMATIC1111 . Here, we will walk you through what ControlNets are, what it can be used and detail out the initial guide to getting your Stable Diffusion ( SD ) working with ControlNets .

A short note on Control nets

If you have worked with Image2Image option in Stable Diffusion (SD ), you know how easily you can transfer a style / pose from a base image to your generated image. Now, ControlNet goes a step forward and create almost exact replicas of your poses / styles / positions.

To put in one line, ControlNets let you decide the posture, shape and style of your generated image when you are using any Text-To-Image based models. Enough of the basic introduction , more later …

What can you do with ControlNet anyways?

The possibilities are endless, but here are a few sample use-cases , you can try your own !

1. Convert those Japanese anime images into other animation forms ….

Anime to other styles ( Image by author, generated using Stable Diffusion )

or even try to convert them to real-life images !

Anime to real life ( Image by author, generated using Stable Diffusion )

or … make them extremely artistic !

2. Reimagine classic paintings ..

Classic paintings reimagined with different ethnicities ( generated using Stable Diffusion)

3. Visualize how ancient marble statues might have looked in real life — with different clothes, settings and times !

Statue imagined as live person from a different era , Image generated using Stable Diffusion ( click to zoom )

Imagine how their dresses would have been in that bygone era …

Statue imagining the different dress worm in a different time period, Image generated using Stable Diffusion ( click to zoom )

.. or even dare to imagine how they might have looked like.

Face of a statue, as imagined by Stable Diffusion with ControlNet

4. Create some animated gif images from a sequence of images that are generated from the ControlNet

A gif image of butterfly created using ControlNet

or some disturbing psychedelic mushroom gifs !

A gif image of mushrooms created using ControlNet

5. See how ancient structures would have looked like in a different building material …

Imagined using Stable Diffusion with the help of ControlNets

6. Or create some other rather less interesting images …

Installation — Lets get the stuff running

  1. Install the extension through your AUTOMATIC1111 UI ( If you have no idea about the AUTOMATIC1111 Web UI for Stable Diffusion, see our article on how to get that running )

For this,

[1] go to the extensions tab,

[2] go to the “Install from URL” option,

[3] Enter the URL for the git repo of this extension (https://github.com/Mikubill/sd-webui-controlnet )

[4] Click Install .. and Voila ! you are done with the setup !

Installing the ControlNet extension for AUTOMATIC1111 WebUI

2. Copy the pre-trained model into your local machine and place it in <AUTOMATIC1111 folder>/models/ControlNet

You can get the pre-trained models from here — https://huggingface.co/lllyasviel/ControlNet/tree/main/models

or a trimmed version from here

https://huggingface.co/webui/ControlNet-modules-safetensors/tree/main

3. Open your AUTOMATIC1111 WebUI, and go to either txt2img or the img2img tabs . Lets use the img2img option . ( you can do almost the exact with txt2img playarea.)

Drop your reference image you want to emulate.

If you scroll down to almost at the bottom of the page, you will see an extra menu for ControlNet, looking like this —

ControlNet Menu option in AUTOMATIC1111 WebUI

Expand the dropdown and there you can find some options. To just get started,

[1]drop the control image into the image box area. This image is the one where you want your final output should take the pose/shape/style from. ( In this example we are using the same image )

[2] Enable the option ( check it on )

[3] Select the pre-processor ( Here we are choosing canny )

[4] Select the corresponding model for the pre-processor

Now you are done and ready to go ! Press ‘Generate’

Some Internals — A quick look into how it works internally

ControlNets rely on some basic network models to generate an intermediate control image ( control map ) which are inspired by some of the Computer vision algorithms like edge detection, depth estimation etc. The Diffusion model then uses these control images to generate the final output image.

Pre-processing algorithms

Intermediate pre-processing step for generating input image for ControlNets. (Input Image credit — twitter account — artistfuly ) , other images generated by Stable Diffusion

Lets go through a few of the intermediate pre-processing algorithms :

1. Canny ( simply the old OpenCV canny edge detection algorithm )

Original Input Image credit — twitter account : artistfuly , other images generated by Stable Diffusion

The input image goes through a preprocessing step, which converts it into the control image ( called control map ). Here, the canny edge detection algorithm creates an intermediate image with just the boundaries for the input image entities ( after some of the usual steps like noise reduction, gradient detection, thresholding etc )

This method works well if you have an input image which has high contrast. Lower contrast images fail to detect the edges efficiently.

2.MLSD

MLSD is a good pre-processor when the input image has a lot of straight lines and sharp edges. Use-cases for these are to generate images of houses, architecture rich buildings , interiors of room isometric projections of buildings, objects etc.

intermediate Control map generated using MSLD pre-processing step, and final image generated using Stable Diffusion

3. depth

When your desired output has a lot of depth variations, your choice of . Some sample use-case settings are — inside of a restaurant, a long shot view of landscape with different entities farther apart.

4. open pose

When you want to replicate the same posture/ pose of the individual subjects in an image, but you still want the model to be creative on the texture and surface of the subjects, opt for the open pose.

Positions of the detected keypoints ( Image generated using ControlNet ) Image source: Stock Image, shutterstock

This is loosely based on / similar to the posenet model (human pose detection). See image below on what the key-points denote.

Positions of the detected keypoints wrt to an actual image of a human body ( Image source : Tensorflow blog )

Tips when using ControlNets

1. Use the pre-processing algorithm/model based on what you type of image you want to generate and what your control image is. Use this quick guide table to decide which ones to use :

Image created by Author

2. When using img2img option, change the dimensions of the output image to that of the width-height of the original image you are uploading. If not the exact dimensions, atleast maintain the exact width:height ratio.

Credits / References

  1. The original White paper for ControlNets “Adding Conditional Control to Text-to-Image Diffusion Models ( Lvmin Zhang, Maneesh Agrawala )

https://arxiv.org/abs/2302.05543

2. The original github repo for ControlNets ( https://github.com/lllyasviel/ControlNet )

3. The pre-trained model ( .pth version — https://huggingface.co/lllyasviel/ControlNet/tree/main/models (Note : these are very bulky files almost 5 GB each )

4. Smaller version of the pre-trained model . Yes, these work as good as the bigger files ( for higher sampling rates ) https://huggingface.co/webui/ControlNet-modules-safetensors/tree/main

5. Online gif making tool (https://ezgif.com/ )

Related Links

  1. If you want to use an online app which lets you create your desired human poses manually , PoseMy.art is a good starting point . Here’s an article showing how to use it with ControlNet.

2. If you like to go base up and want to get a guide on writing text prompt for Midjourney ( also for stable diffusion ), here is the link :

Stable Diffusion
Midjourney
Text To Image Generation
Diffusion
Ml So Good
Recommended from ReadMedium