Anonymize facial data on video: blur people’s faces using OpenCV
There’s a lot of hype about object detection. Computer vision has become the new quest when someone talks about machine learning and AI. As you may know, I develop mostly for the (sustainable) fashion market and everything that’s image related in Fashion is gold.
Fashion is starting to discover Machine Learning potential and it’s possible today to find Kaggle competitions and Hackathons focused on this market only.
No need to say that computer vision is so important in fashion because it helps brands to detect trends and to better manage catalogs, together with NLP and Data Mining.
But, how’s this marketing protecting people’s personal lives whiling using ML?
In case you don’t know, in Europe, data protection is taken seriously. We have recently watched the birth of GDPR, the General Data Protection Regulation, which states how companies can use people’s personal data.
In Brazil, from 2020, the “Lei Geral de Proteção de Dados” (LGPD) will take effect. In Australia, the Privacy Amendment came into effect in 2018 and in USA, the California Consumer Privacy Act (CCPA) overlaps GDPR in may aspects.
Data protection laws and face recognition
While you may not agree that someone’s face is personal data, Europe thinks it is.
In August this year, Sweden’s Data Protection Authority has issued its first fine for violations of the GDPR. The problem: a school launched a facial recognition pilot program, but forgot to request consent from the students.
The problem with the system is that the school was using biometric data and failed to notify Sweden’s Data Protection Authority about the program.
According to the board notes: “the school has processed sensitive biometric data unlawfully and failed to do an adequate impact assessment, including seeking prior consultation with the Swedish DPA”.
Here and there, some startups claim they are able to fetch and use social media pictures to feed fashion trend discovery algorithms.
While they are mainly interested in the clothes that people are using on the pictures, they may use personal features such as skin color and other biometric data to identify these trends and who mostly buy them.
In theory, it would be possible to know if ethnical groups buy differently. Which is discrimination.
Protecting people on pictures
While I am not sure about legal implications of using social media photos on business projects without proper consent of the subjects on these pictures (sorry, you better ask a lawyer about this one), you may want to be proactive and use data anonymization.
On images, this can be achieved by hiding features that could lead to a person identification, such as the person’s face, tattoos, scars etc.
Finding faces is easy with the libraries we have today. Using Python and OpenCV, you may start to create a basic algorithm.
For today’s tutorial, we will see how to blur someone’s face on video using OpenCV and Python (you can use the same technique on still images, but since there are a lot of tutorials focused on images, I have decided to apply it on video).
Getting ready
Before we start, we have to download the faces cascade. On OpenCV, cascades work as decision trees that evaluate the features on an image and decide if they are — or not — part of what we trying to identify.
You will have to download the haarcascade_frontalface_default.xml
file on OpenCV Github repository: https://github.com/opencv/opencv/tree/master/data/haarcascades.
Create a new folder on your computer and put this file on it. Then, create a Python file and name it how you want. You just have to put it on the same directory as the xml file.
If you don’t have it yet, install Anaconda
(https://www.anaconda.com/). and open it. If you don’t want to mess up your current environment, create a new one and install OpenCV
on it.
Inside Anaconda, open Spyder
. We will use it to write and test the code, but you can use any other tool.
Let’s start to code. First thing to do is to load OpenCV
and the cascade file. The cascade can be loaded with the help of cv2.CascadeClassifier()
.