Summary

Google is leveraging reCAPTCHA to train their self-driving car AI by having users unknowingly label images while verifying their humanity online.

Abstract

Google's reCAPTCHA, originally designed to distinguish between humans and bots, has evolved beyond its initial purpose of digitizing books to serving as a tool for enhancing Google Maps and, more recently, training AI for self-driving cars. Users interacting with reCAPTCHA are now inadvertently tagging images of vehicles, road signs, and other street elements, providing valuable data for Google's autonomous vehicle technology. This crowdsourcing approach allows Google to harness millions of human hours for free, significantly advancing their machine learning models for computer vision tasks essential to self-driving car development.

Opinions

The author suggests that Google's use of reCAPTCHA for training self-driving car AI is a strategic move to gain an edge in the autonomous vehicle industry.
There is an implication that Google's reCAPTCHA system is a clever way to obtain free labor for data tagging, which is crucial for the development of AI systems.
The article speculates that Google may be using reCAPTCHA data to 'check' or validate the accuracy of their machine learning models rather than solely relying on it for direct image tagging.
Michael Cutter, director of computer vision at Tortuga, is quoted as being critical of the idea of wasting human effort on tasks that could be automated, suggesting that the data from reCAPTCHA is too valuable to be used in such a manner.
The author seems to accept the use of reCAPTCHA data for Google Maps improvement but raises questions about the ethics of using it for commercial AI training without explicit user consent.
The article ends with a reflective question to the reader, asking for their opinion on Google using reCAPTCHA data for their self-driving car agenda, hinting at a potential ethical dilemma.

Google Is Using Us to Train Their Self Driving Cars

Why reCAPTCHA is doing more than you think

Photo by Samuele Errico Piccarini on Unsplash

If you use the internet, you would have, at one point, done a reCAPTCHA. Developed as a project by Luis von Ahn and Ben Maurer, it’s a CAPTCHA system that enables websites and software the ability to distinguish between humans and bots, which Google ended up acquiring in 2009.

There have been a couple of versions since the original release, with earlier ones focusing on deciphering the hard-to-read text. More recent versions have focused on identifying specific objects like cars and buses.

If you’ve forgotten what the older versions looked like, here’s an example:

It’s public knowledge the original CAPTCHA was used to help digitize books

Of course, the original intent was to keep spammers and bots away, but it also had an alternative purpose. The internet archive had over 200 000 scanned copies of books, yet at the time, there were no indexable digitized versions of these books (with some books having fancy writing).

To help index these, would have required millions of human hours to help do this. This is where CAPTCHAs came in.

By identifying spammers, people were helping solve and convert words. Multiple people often tested these words to get a more accurate result.

You might be thinking how on earth can solving two words per person help digitize hundreds and thousands of books. But during its time, the founders estimated that over 60 million CAPTCHAs were being solved every day. This equals around 160 000 human hours per day.

That’s a lot of digitizing! Soon after, Google acquired this technology for their own projects.

After being acquired by Google, it moved to build and street numbers

Source: https://security.googleblog.com/2014/04/street-view-and-recaptcha-technology.html

People started noticing that instead of answering text-based questions, we had moved on to answering images with building and street numbers. It might have been a subtle change since we were so used to answering CAPTCHAs focused on the text we might not have noticed the sudden change from words and numbers to words and numbers in images.

It was no surprise that Google came out with an announcement after a few years.

We’re currently running an experiment in which characters from Street View images are appearing in CAPTCHAs. We often extract data such as street names and traffic signs from Street View imagery to improve Google Maps with useful information like business addresses and locations.

Based on the data and results of these reCAPTCHA tests, we’ll determine if using imagery might also be an effective way to further refine our tools for fighting machine and bot-related abuse online — Google Spokesperson from 2012

Personally, I have no issues since I use Google Maps regularly as a consumer, so using the data to further that agenda is a tick for me. However, in recent years, you would have been asked to verify a new type of image.

More recently, you’ve been asked to verify buses, cars, road signs, and more

Sound familiar?

With the acquisition of Waymo and Google’s intent to join the self-driving car race, companies that can better perfect their technology will gain an advantage over competitors.

To perfect this technology requires extensive tagging of millions of photos to train models to recognize humans, buses, other cars, traffic lights, and more on the street.

So what does Google do?

Of course, change up their reCAPTCHAs to get the ordinary citizens of the world to start labeling what a bus looks like. This means millions of hours of free labor at their fingerprint to further the development of their self-driving car program.

Of course, Google hasn’t mentioned that they are using reCAPTCHAs for their new plan, but evidence points heavily towards so. This is probably because the self-driving race is still on, and companies are still competing to get to fully autonomous technology.

“I couldn’t imagine wasting human effort like that.

Training data is too valuable to modern computer vision techniques. You’d want to do something with it.” — Michael Cutter, director of computer vision at Tortuga talking about Google’s reCAPTCHA

Obviously, the companies with the best data and resources will end up winning the self-driving race, so Google is definitely a front contender with excellent resources at hand.

Cutter thinks that Google is using reCAPTCHA to ‘check’ classifiers rather than relying on the information as an outright image tag. Of course, it’s all speculation, but it makes sense as the accuracy of these tags is needed to train Google’s own machine learning models, so using it as a check makes more sense.

However, it still points to the fact Google is very likely using us as a human-AI farm and will continue to do so as reCAPTCHA’s agenda changes and computer vision and machine learning continue to progress.

Who knows what’s next after this.

How do you feel that Google’s using your reCAPTCHA data for their own self-driving car agenda?