avatarWilame

Summary

This article discusses the importance of anonymizing facial data on videos using OpenCV for data protection and privacy, particularly in the context of fashion and machine learning.

Abstract

The article emphasizes the significance of data protection laws, such as GDPR, LGPD, and CCPA, in safeguarding personal data, including facial recognition. It highlights the potential misuse of facial data in fashion trend discovery algorithms and the need for anonymization to protect individuals' privacy. The author provides a step-by-step guide on how to blur people's faces using OpenCV and Python, focusing on video content. The tutorial covers the process of downloading the necessary files, setting up the environment, and coding the function to detect and blur faces in real-time.

Opinions

  • The author believes that data protection laws are crucial for protecting people's personal lives while using machine learning.
  • The author suggests that using social media photos for business projects without proper consent may have legal implications.
  • The author emphasizes the importance of being proactive in using data anonymization to protect individuals' identities.
  • The author argues that hiding features such as faces, tattoos, and scars can prevent the identification of individuals in images.
  • The author suggests that the use of biometric data without proper consent can lead to discrimination.
  • The author believes that OpenCV and Python are effective tools for blurring faces in videos to protect individuals' privacy.
  • The author stresses the importance of ethics in data science and the responsibility of data scientists to protect the data they handle.

Anonymize facial data on video: blur people’s faces using OpenCV

There’s a lot of hype about object detection. Computer vision has become the new quest when someone talks about machine learning and AI. As you may know, I develop mostly for the (sustainable) fashion market and everything that’s image related in Fashion is gold.

Fashion is starting to discover Machine Learning potential and it’s possible today to find Kaggle competitions and Hackathons focused on this market only.

No need to say that computer vision is so important in fashion because it helps brands to detect trends and to better manage catalogs, together with NLP and Data Mining.

But, how’s this marketing protecting people’s personal lives whiling using ML?

In case you don’t know, in Europe, data protection is taken seriously. We have recently watched the birth of GDPR, the General Data Protection Regulation, which states how companies can use people’s personal data.

In Brazil, from 2020, the “Lei Geral de Proteção de Dados” (LGPD) will take effect. In Australia, the Privacy Amendment came into effect in 2018 and in USA, the California Consumer Privacy Act (CCPA) overlaps GDPR in may aspects.

Data protection laws and face recognition

While you may not agree that someone’s face is personal data, Europe thinks it is.

In August this year, Sweden’s Data Protection Authority has issued its first fine for violations of the GDPR. The problem: a school launched a facial recognition pilot program, but forgot to request consent from the students.

The problem with the system is that the school was using biometric data and failed to notify Sweden’s Data Protection Authority about the program.

According to the board notes: “the school has processed sensitive biometric data unlawfully and failed to do an adequate impact assessment, including seeking prior consultation with the Swedish DPA”.

Here and there, some startups claim they are able to fetch and use social media pictures to feed fashion trend discovery algorithms.

While they are mainly interested in the clothes that people are using on the pictures, they may use personal features such as skin color and other biometric data to identify these trends and who mostly buy them.

In theory, it would be possible to know if ethnical groups buy differently. Which is discrimination.

Protecting people on pictures

While I am not sure about legal implications of using social media photos on business projects without proper consent of the subjects on these pictures (sorry, you better ask a lawyer about this one), you may want to be proactive and use data anonymization.

On images, this can be achieved by hiding features that could lead to a person identification, such as the person’s face, tattoos, scars etc.

Finding faces is easy with the libraries we have today. Using Python and OpenCV, you may start to create a basic algorithm.

For today’s tutorial, we will see how to blur someone’s face on video using OpenCV and Python (you can use the same technique on still images, but since there are a lot of tutorials focused on images, I have decided to apply it on video).

Getting ready

Before we start, we have to download the faces cascade. On OpenCV, cascades work as decision trees that evaluate the features on an image and decide if they are — or not — part of what we trying to identify.

You will have to download the haarcascade_frontalface_default.xml file on OpenCV Github repository: https://github.com/opencv/opencv/tree/master/data/haarcascades.

Create a new folder on your computer and put this file on it. Then, create a Python file and name it how you want. You just have to put it on the same directory as the xml file.

If you don’t have it yet, install Anaconda (https://www.anaconda.com/). and open it. If you don’t want to mess up your current environment, create a new one and install OpenCV on it.

Inside Anaconda, open Spyder . We will use it to write and test the code, but you can use any other tool.

Let’s start to code. First thing to do is to load OpenCV and the cascade file. The cascade can be loaded with the help of cv2.CascadeClassifier() .

Our goal is to detect faces on a video and blur it ‘live’. A way to do it is to create a function that will process each frame of the video, look for faces on it, blur the face and return the blurred frame.

We will create the function now.

We will create the function find_and_blur . It will accept 2 arguments: a back and white version of the frame and the frame itself, with its original colors.

Then, we will use the function cv2.detectMultiScale to find all faces on the black and white image. This is why we need two versions of the frame: this function will work only with the bw version of our frame.

cv2.detectMultiScale takes as arguments the bw image, the scaleFactor, which tells how much an image is reduced in size at each image scale and the minNeighbors, which specifies, selon OpenCV documentation, “how many neighbors each candidate rectangle should have to retain it”.

You can play with these arguments. I found that the best values for me were 1.1 and 4.

Then we will iterate over the found faces to select their positions on the image. This position will be represented by what we call “Region of Interest” (ROI) on each frame.

Here, we can use the colored version of the frame. Since this will be the returned frame, we need to blur it instead of the black and white version. Remember: the black and white image is used to face detection. The color version is the one we will return.

Once we find the ROI, we can blur it using cv2.GaussianBlur. You just have to tell which region of the image has to be blurred: the part that contains the faces. Then, just assign the blurred portion of the image to the complete colored frame.

That’s it for the function.

Now, it’s time to get our video. We’ll use the integrated camera of the computer to get it. Use cv2.VideoCapture to turn the camera on.

Then, start a while loop that will continue to run until the 'q' key is pressed on the keyboard. As you may suppose, ‘q’ stands for ‘quit’, but you can choose any other key.

On the loop, we’ll get the last captured frame to create a grayscaled version of it. Pass these two frames as arguments of the find_and_blur function.

The result will then be displayed.

When the ‘q’ key is pressed, we will quit the loop, close the connection with the camera and close the window.

That’s it. You should have something like this:

Hi!

Conclusions

As I have explained, I have decided to use video instead of image only because you may find a lot of tutorials explaining how to detect a face on an image, but not as many explaining how to do the same using video.

However, this code can be easily adapted to still images.

OpenCV is a great starting point to detect objects and people on images, but it is not perfect. If the face is slightly turned, the algorithm may not work. This is not a final production code.

There are a lot of other solutions that use deep learning t accomplish the same result. You should consider them.

The important thing to retain is that you, as a Data Scientist, is responsible for the data you have. Keep ethics above all.

Complete code:

Do you want to connect? It will be a pleasure to discuss Machine Learning and Machine Learning for Fashion industry with you. Message me on https://www.linkedin.com/in/limavallantin.

Machine Learning
Opencv
Blur
Anonymization
Gdpr
Recommended from ReadMedium