Google Is Using Us to Train Their Self Driving Cars
Why reCAPTCHA is doing more than you think
If you use the internet, you would have, at one point, done a reCAPTCHA. Developed as a project by Luis von Ahn and Ben Maurer, it’s a CAPTCHA system that enables websites and software the ability to distinguish between humans and bots, which Google ended up acquiring in 2009.
There have been a couple of versions since the original release, with earlier ones focusing on deciphering the hard-to-read text. More recent versions have focused on identifying specific objects like cars and buses.
If you’ve forgotten what the older versions looked like, here’s an example:

It’s public knowledge the original CAPTCHA was used to help digitize books
Of course, the original intent was to keep spammers and bots away, but it also had an alternative purpose. The internet archive had over 200 000 scanned copies of books, yet at the time, there were no indexable digitized versions of these books (with some books having fancy writing).
To help index these, would have required millions of human hours to help do this. This is where CAPTCHAs came in.
By identifying spammers, people were helping solve and convert words. Multiple people often tested these words to get a more accurate result.
You might be thinking how on earth can solving two words per person help digitize hundreds and thousands of books. But during its time, the founders estimated that over 60 million CAPTCHAs were being solved every day. This equals around 160 000 human hours per day.
That’s a lot of digitizing! Soon after, Google acquired this technology for their own projects.
After being acquired by Google, it moved to build and street numbers

People started noticing that instead of answering text-based questions, we had moved on to answering images with building and street numbers. It might have been a subtle change since we were so used to answering CAPTCHAs focused on the text we might not have noticed the sudden change from words and numbers to words and numbers in images.
It was no surprise that Google came out with an announcement after a few years.
We’re currently running an experiment in which characters from Street View images are appearing in CAPTCHAs. We often extract data such as street names and traffic signs from Street View imagery to improve Google Maps with useful information like business addresses and locations.
Based on the data and results of these reCAPTCHA tests, we’ll determine if using imagery might also be an effective way to further refine our tools for fighting machine and bot-related abuse online — Google Spokesperson from 2012
Personally, I have no issues since I use Google Maps regularly as a consumer, so using the data to further that agenda is a tick for me. However, in recent years, you would have been asked to verify a new type of image.
More recently, you’ve been asked to verify buses, cars, road signs, and more
Sound familiar?
With the acquisition of Waymo and Google’s intent to join the self-driving car race, companies that can better perfect their technology will gain an advantage over competitors.
To perfect this technology requires extensive tagging of millions of photos to train models to recognize humans, buses, other cars, traffic lights, and more on the street.
So what does Google do?
Of course, change up their reCAPTCHAs to get the ordinary citizens of the world to start labeling what a bus looks like. This means millions of hours of free labor at their fingerprint to further the development of their self-driving car program.
Of course, Google hasn’t mentioned that they are using reCAPTCHAs for their new plan, but evidence points heavily towards so. This is probably because the self-driving race is still on, and companies are still competing to get to fully autonomous technology.
“I couldn’t imagine wasting human effort like that.
Training data is too valuable to modern computer vision techniques. You’d want to do something with it.” — Michael Cutter, director of computer vision at Tortuga talking about Google’s reCAPTCHA
Obviously, the companies with the best data and resources will end up winning the self-driving race, so Google is definitely a front contender with excellent resources at hand.
Cutter thinks that Google is using reCAPTCHA to ‘check’ classifiers rather than relying on the information as an outright image tag. Of course, it’s all speculation, but it makes sense as the accuracy of these tags is needed to train Google’s own machine learning models, so using it as a check makes more sense.
However, it still points to the fact Google is very likely using us as a human-AI farm and will continue to do so as reCAPTCHA’s agenda changes and computer vision and machine learning continue to progress.
Who knows what’s next after this.
How do you feel that Google’s using your reCAPTCHA data for their own self-driving car agenda?