Tired of being ripped of by AI companies, artists are booby trapping their work

Summary

Artists are employing strategies to corrupt the datasets used by AI image generation companies to protect their intellectual property and demand fair compensation.

Abstract

In response to AI companies using their work without permission or compensation, artists are taking matters into their own hands by introducing corrupted data into the AI training process. This retaliatory measure, known as "booby trapping," involves adding invisible alterations to images that confuse the AI algorithms, rendering them less reliable. The tactic is a form of protest against the unauthorized use of artists' work and highlights the legal and ethical challenges posed by web scraping for AI training. The practice of data poisoning forces AI companies to implement costly content monitoring systems and raises awareness about the quality of data fed into AI systems, emphasizing the principle of "garbage in, garbage out."

Opinions

Artists are frustrated with AI companies for using their images without consent to train algorithms like Dall-E, Midjourney, and Stable Diffusion.
The legal system is seen as inadequate or too slow to address the concerns of artists whose work is being misappropriated by AI companies.

Tired of being ripped of by AI companies, artists are booby trapping their work

IMAGE: Noorataijala — Pixabay

It wasn’t long after the appearance of Dall-E, followed by other generative image processing algorithms such as Midjourney or Stable Diffusion, that a big problem became apparent: the companies that had created them had accumulated huge collections of images labeled with descriptions, and then trained their algorithms with them.

Where did they acquire these huge collections of images? By scraping web sites, mostly image repositories. Getty Images’s lawsuit against Stable Diffusion made clear that the origin of their images was so obvious that in many cases the images generated contained distorted versions of their watermark, because the algorithm interpreted it as just another part of the image.

The legal problem was obvious: we have spent years saying that if something is public on the web it can be subject to scraping. There are legal precedents of all kinds that affirm the right of someone to go to a web page and take all of its content for whatever purposes they see fit. Because of its complexity, the case in question can go on for years and end up in the Supreme Court, and in the meantime, artists whose images have been used for algorithm training see how their creations can be easily imitated, or how someone can simulate their style to make new images.

Accepting that the courts are unlikely to offer much help, some artists are booby trapping their work by creating images treated with software that introduces invisible alterations in them to confuse the algorithms, in the same way the process is used to invisibly modify the faces of people in photographs or video and prevent their use by facial recognition algorithms. Named Nightshade in honor of the Atropa belladonna, a plant that causes hallucinations, the algorithm allows users to publish altered photographs that generate descriptions in the algorithm that are different from their real content, which causes the algorithm to get confused in its results and offer images that are not what had been asked for.

The result is equivalent to poisoning archives with images that still fulfill their function: it is still possible to view them and choose them based on the conditions set by the artists; but when they are sucked up by an algorithm, they create “hallucinations”. The more “poisoned” images, the more unpredictable the algorithm becomes, forcing companies to set up mechanisms to monitor the content they use for training, raising their costs considerably.

This is a wake-up call for the companies that create these types of tools, and explains many of the problems they have been warned about: if you feed your algorithm with garbage, it will generate garbage. In many cases, we are talking about companies that are trying to run before it can walk, that needs to deliver results too fast to justify itself to its investors, and end up using inadequate information that should never be at the basis of any training, making their algorithms potentially less reliable. Basically, “garbage in, garbage out.” As with most educational processes, haste is not a good idea.

In practice, artists are free to do what they like to their work, in the same way that until now it was believed that nothing could prevent a company from scraping the entire contents of an archive to train an algorithm. Nothing is written in stone, and as some artists have shown, and particularly, those who manage their copyrights, it looks like some kind of deal will have to worked out whereby they get adequate compensation when their images are used for algorithm training.

Watch this space.

(En español, aquí)

Tired of being ripped of by AI companies, artists are booby trapping their work

What are we going to use to train algorithms with?

Explaining the various factors involved in the complex process of obtaining data for training machine learning…