The day I fed my friends to an IBM algorithm

Summary

The author reflects on the ethical implications of IBM using his and his friends' Creative Commons-licensed Flickr photos to train facial recognition algorithms without explicit consent, raising concerns about privacy and the potential misuse of open licenses.

Abstract

The article discusses the ethical and legal dilemmas surrounding IBM's use of publicly available photographs from Flickr to develop facial recognition technology. The author, who has over 3,600 photos on Flickr under a Creative Commons license, discovered that IBM's database included images of him and his friends. While open to his photos being used, he questions the implications for his friends' privacy and whether the use of their likenesses in this way is lawful. The author ponders whether he should have been more selective with the licensing of his photographs, particularly those including people. IBM's defense is that they utilized a dataset made public by Yahoo! for research purposes, which they refined to improve facial recognition technology and reduce biases. However, the author argues that the potential for misuse of the database necessitates more stringent controls and explicit permissions. The incident prompts broader questions about the trust users place in open licenses and the responsibilities of companies in utilizing personal data for technology development.

Opinions

The author acknowledges the legality of his photographs' use under Creative Commons licenses but questions the ethics of using his friends' likenesses without their consent.
There is a concern that open licenses like Creative Commons may not adequately cover the use of images for developing controversial technologies like facial recognition.
The author suggests that a more nuanced approach to licensing, especially for photos containing people, might be necessary to protect privacy.
IBM's use of the database for potentially sensitive applications, such as law enforcement, without seeking additional permissions, is seen as problematic.
The author believes that the responsibility for the potential misuse of images lies not just with the individuals uploading content but also with the companies that use it, like Yahoo! and IBM.
There is a call for greater control and express permission for the use of such databases to prevent harmful applications of facial recognition technology.
The author muses on whether the incident is a result of misplaced trust in open licenses or an abuse of that trust by corporations.
The broader implications of the constant uploading of images and their subsequent use by third parties are raised, suggesting a need for societal adjustment to new privacy norms.

The day I fed my friends to an IBM algorithm

An NBC investigation, “Facial recognition’s ‘dirty little secret’: Millions of online photos scraped without consent”, explores IBM’s use of photographs to train its facial recognition algorithms: the company used photographs taken from Flickr published under Creative Commons licenses to create a database — which it recently made available — and used it to develop its technology.

This is a subject of particular interest to me: I was an early Flickr user and have more than 3,600 photographs stored there, but have not used it for a while, and I also publish all my photos — like most of my professional output — with the least restrictive Creative Commons license model (CC BY or Attribution). Using a tool created by NBC to consult the database IBM has used to train its facial recognition algorithms, I see that the company has taken three images from my collection, some of them at an event in which I appear with friends. I’m sure they had no problem with the photos being published, catalogued or associated with an open license, but they now find their faces, and possibly some other metadata or information such as their names, have been used by a company to develop a controversial technology.

There are a number of aspects to all this: firstly, the legality of using photographs. I am completely used to mine being used for different purposes. I understand how open licenses work and in general I like seeing one of my photographs used in some publication: I would never have imagined that as an amateur photographer my work would appear in media of all kinds, such as Wired. However, there are other issues related to question of whether IBM’s use of my photographs is legal, which is the faces of the people included in them, about which, logically, I have no rights, and nor should I.

Was I mistaken to tag all my photographs as Creative Commons BY and instead have kept a strict copyright over those that contained images of people? Instead of using a blanket license, perhaps each time I upload a photograph to Flickr I should have thought more about the type of license to use. I’m no lawyer, but even accepting that responsibility, does that automatically give IBM the right to use my photographs with the faces of my friends in a database? One could argue that it has exceeded the terms of a license that was designed to regulate the public use of the images, and not for other uses.

IBM says it merely used a 14GB file of one hundred million images that Yahoo!, then the owner of Flickr, published openly on Yahoo! for use by researchers, which could shift discussion about the responsibility of a possible misuse of the license elsewhere. IBM reduced the size of the original database, converting it into a file of approximately one million faces, supplemented by adding about two hundred values ranging from measurements of certain facial dimensions to the type of pose, skin tone, gender or estimated age.

The database has been used to train all kinds of algorithms, including some for police use, as well as its own tool, IBM Watson Visual Recognition, which can estimate people’s age or gender, as well as recognizing specific individuals. Considering the controversy associated with facial recognition technologies, the company should at least have considered the possibility of requesting permission from the authors of the photographs, instead of assuming that a particular license that was not conceived with such uses in mind.

IBM says it has used the database to try to reduce biases in facial recognition and improve the quality of the technology. But the database is there, available to anyone who wants to download it and put it to potentially harmful use, which means that the time has come for greater control to be applied, and express permission requested for its use.

Where does the problem lie? Misguided trust on the part of the authors of the photographs, or misinterpretation of the potential of open licenses? Have companies abused that trust in using the contents for their own ends? Is it my mistake or Yahoo!’s, or IBM’s? Or are we all to blame? What is happening to all these pictures we are constantly uploading all over the place?

Or perhaps there is no problem here at all and it’s just that we’re going to have to get used to anything we upload being used by third parties for any purpose they want?

This article was previously published on Forbes.

(En español, aquí)