Summary

Stable Diffusion embeds an invisible watermark in generated images using a combination of DWT and DCT algorithms to ensure the integrity of the images and prevent their misuse in training other AI models.

Abstract

Stable Diffusion, an AI image generation tool, incorporates an invisible watermark into its output images using the invisible-watermark Python library. This watermark is imperceptible to the human eye and is embedded through a process involving Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT) algorithms. The watermark, containing the text "StableDiffusionV1," is strategically placed in middle-frequency sub-bands to avoid degrading image quality or being removed by compression. The embedded watermark is resilient to various attacks but can be compromised by image editing techniques such as cropping, resizing, or rotating. The primary purpose of this watermark is not for tracking but to potentially filter out AI-generated images in the future and maintain the authenticity of the original AI-generated content.

Opinions

The use of DWT and DCT algorithms in combination is considered effective for watermarking as it compensates for the individual drawbacks of each method.
The watermark is designed to be robust against multiple types of attacks, demonstrating the importance of protecting the integrity of AI-generated images.
The article suggests that the watermarking technique is not intended for user tracking but rather for the future identification and management of AI-generated content.
The author implies that the chosen watermarking method is a balance between effectiveness and the practicality of on-the-fly embedding, acknowledging that there are more robust watermarking algorithms available.
The article promotes a cost-effective AI service, ZAI.chat, as an alternative to ChatGPT Plus (GPT-4), indicating a preference for more accessible AI tools.

Stable Diffusion — The Invisible Watermark in Generated Images

· Invisible Watermark · Code · How Does It Work? ∘ Where Is The Watermark Being Inserted? · How to Check? · Is It Removable? · Any Tracking? · References

While everyone is using Stable Diffusion to generate artwork, have you ever realized there is a watermark in the generated images?

Invisible Watermark

The official Stable Diffusion code uses a Python library called invisible-watermark to embed an invisible watermark on the generated images.

GitHub - ShieldMnt/invisible-watermark: python library for invisible image watermark (blind image…

invisible-watermark is a python library and command line tool for creating invisible watermark over image.(aka. blink…

github.com

By “invisible”, I mean real invisible — invisible to the human eye.

Code

Here are the code segments that Stable Diffusion [1] uses to embed watermarks.

def put_watermark(img, wm_encoder=None):
    if wm_encoder is not None:
        img = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
        img = wm_encoder.encode(img, 'dwtDct')
        img = Image.fromarray(img[:, :, ::-1])
    return img
...

wm = "StableDiffusionV1"
wm_encoder = WatermarkEncoder()
wm_encoder.set_watermark('bytes', wm.encode('utf-8'))
img = put_watermark(img, wm_encoder)

As you can see, a watermark “StableDiffusionV1” is being put into the generated image.

img = wm_encoder.encode(img, 'dwtDct')

Here, they are using DWT + DCT algorithm for the watermarking process. For details, you can check out the following wiki pages.

Discrete Wavelet Transform (DWT)
Discrete Cosine Transform (DCT)

TL;DR

Both the DWT and DCT are algorithms to decompose image signals into different wavelets.

They can be used independently for watermarking, but applying both of them allows them to compensate for the drawbacks of each other, resulting in more effective watermarking.

How Does It Work?

The watermarking is done by altering the wavelets coefficients of carefully selected DWT sub-bands, followed by applying the DCT transform on these sub-bands. [2]

In other words, the DWT and DCT algorithms will decompose an image into different frequency bands (sub-bands).

These sub-bands are then used to identify the areas in the image where a watermark can be embedded effectively.

Finally, the watermark is inserted by altering the wavelets coefficients of the targeted sub-bands,

Where Is The Watermark Being Inserted?

Terminology:

Low-frequency: coarse-grained / low-resolution features

high-frequency: fine-grained / high-resolution features

In general, most of the perceptible signals are concentrated at the lower-frequency sub-bands, and therefore this is not a good place for embedding watermarks because it will degrade the image quality significantly.

On the other hand, the high-frequency sub-bands include the edges and textures of the image. The human eye is not generally sensitive to changes in such sub-bands. However, high-frequency components of an image are usually removed by image compression and noise attacks.

Therefore, the middle-frequency sub-bands will be suitable for embedding watermarks without them being perceived by the human eye, and without being removed by compression.

How to Check?

The watermark can be decoded using the same library.

import cv2
from imwatermark import WatermarkDecoder

def testit(img_path):
    bgr = cv2.imread(img_path)
    decoder = WatermarkDecoder('bytes', 136)
    watermark = decoder.decode(bgr, 'dwtDct')
    try:
        dec = watermark.decode('utf-8')
    except:
        dec = "null"
    print(dec)

The above code segment is excerpted from the test_watermark.py file in the official Stable Diffusion repository [1].

decoder = WatermarkDecoder('bytes', 136)

Note that the length of the string “StableDiffusionV1” is 17, and the size of each character is 1 byte (8 bits). Therefore the total number of bits to decode is 17*8=136

Is It Removable?

It is worth mentioning that the DWT + DCT method is able to withstand multiple attacks.

Though, it is vulnerable to attacks like cropping, resizing, and rotating.

So, it is possible to get rid of the embedded watermark by editing the image in the above ways.

There are other watermarking algorithms that are stronger than the DWT-DCT method, but this method is fast and suitable for on-the-fly embedding.

Any Tracking?

No, there is no tracking mechanism in this watermark.

As I mentioned above, it is just an algorithm to embed watermark text into the image such that it is unperceivable to the human eye.

The purpose of this watermark is probably for filtering out AI-generated images in the future to avoid them being used in training new AI models.

References

[1] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with Latent Diffusion Models,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. GitHub. https://github.com/CompVis/stable-diffusion.

[2] A. Al-Haj, “Combined DWT-DCT digital image watermarking,” Journal of Computer Science, vol. 3, no. 9, pp. 740–746, 2007.

[3] “Discrete wavelet transform,” Wikipedia, 03-Jul-2022. https://en.wikipedia.org/wiki/Discrete_wavelet_transform.

[4] ShieldMnt, “SHIELDMNT/invisible-watermark: Python library for invisible image watermark (blind image watermark),” GitHub. https://github.com/ShieldMnt/invisible-watermark.