The 80/20 rule of learning to code
Only 20% of the world’s population speaks English… so why is most of our accumulated technical knowledge still English-only?
At Roboflow, we’re interested in exploring solutions to previously-intractable problems by leveraging recent advances in machine learning. Interested in learning more? Reach out!
Lately, I’ve been spending a lot of time on Stack Overflow. The question and answer site is a godsend for programmers. Most problems a coder encounters have already been asked and answered by the experts on Stack Overflow. All it takes is a simple search and your problems are solved.
Unfortunately, almost all of this accumulated knowledge is inaccessible to the vast majority of the world’s population! Stack Overflow is primarily an English-language site but only 20% of the world speaks English.
Stack Overflow has sites in a few other languages (Spanish, Russian, Japanese, and Portuguese) but, unfortunately, there is far less content from a far smaller community available in each of these languages.
How can we make it better?
Recent research from Google, Facebook, and others has shown groundbreaking advancements in automatic translation powered by new machine learning techniques. This has made it possible for us to provide high quality translations of English content for users around the world (no matter what language they speak).
So we can just run the Stack Overflow archive through Google Translate and we’re done, right? Well, not quite! Although Google Translate is now powered by an advanced neural network, it is trained on general text, not technical language. But we can train our own neural machine translation model using the Stack Overflow archives based on their techniques!
This is exactly what we’re planning on doing!
But wait
If we only have our content in English, how will we know whether our machine learning model does a good job of translating it? And this fancy machine learning stuff sounds like it’s going to take a lot of time and effort to get right; are we sure there’s even demand for this content in other languages?
Enter the “Smoke and Mirrors Test”
To answer those two questions, today we’re releasing PreguntaRepuesta.com, the top-10 most popular questions from Stack Overflow translated (by a professional human translator) into Spanish.
I first heard about the concept of a Smoke and Mirrors test from Tim Ferriss. The idea is simple: to validate demand for an idea, create a fake landing page for a product, try to get customers, and see how many people actually click “buy”. If nobody does, you’ve saved yourself a lot of time and effort creating a product that nobody wanted (or that you would have been unable to effectively market). If lots of people do, you know you have a hit on your hands before you’ve even made the product!
That’s what PreguntaRepuesta.com is; it’s a way to determine whether there are lots of Spanish-speaking people searching for programming help in their native language. If I get a lot of people going to the site, it’s probably worth pursuing the neural machine translation model I’m pretty sure I could build (given enough time). If not, I’ll move on to the next idea on my list!
Now we wait
I made sure to search engine optimize the page as much as I could (I even added AMP). Hopefully this small subset of translated content will validate a need in the market!
In the meantime, we’re continuing to prototype other ideas at Roboflow. Our next release will be in the field of Augmented Reality. Follow me on Twitter to keep up to date on all of our latest developments!






