avatarBrad Dwyer

Summary

Roboflow is exploring the creation of a multilingual knowledge base for coders by leveraging advanced machine learning techniques to translate technical content from Stack Overflow into various languages, starting with a test for Spanish-speaking audiences through PreguntaRepuesta.com.

Abstract

The article discusses the initiative by Roboflow to address the language barrier in coding education and resources, where a vast majority of technical knowledge, particularly on platforms like Stack Overflow, is available only in English. Despite Stack Overflow's efforts to include a few other languages, the content available in these is significantly less. Roboflow plans to utilize recent advancements in machine learning from Google and Facebook to provide high-quality translations of English content into other languages. Before committing to this extensive project, Roboflow is conducting a market validation test by launching PreguntaRepuesta.com, which features professionally translated top Stack Overflow questions into Spanish. This "Smoke and Mirrors Test" will help determine if there is sufficient demand for programming resources in languages other than English.

Opinions

  • The author believes that most of the world's population, who do not speak English, is missing out on valuable coding knowledge due to language barriers.
  • There is an acknowledgment that while Google Translate has improved with neural network-based models, it is not yet optimized for technical language, which justifies the need for a specialized translation model.
  • The author is optimistic about the potential of training a neural machine translation model using Stack Overflow's archives to better serve non-English speaking coders.
  • The importance of market validation is emphasized through the "Smoke and Mirrors Test," which is seen as a crucial step before fully committing to the development of a machine learning model for translation.
  • Roboflow is portrayed as a forward-thinking company that is actively seeking solutions to previously unsolvable problems by applying the latest advancements in machine learning.

The 80/20 rule of learning to code

Only 20% of the world’s population speaks English… so why is most of our accumulated technical knowledge still English-only?

At Roboflow, we’re interested in exploring solutions to previously-intractable problems by leveraging recent advances in machine learning. Interested in learning more? Reach out!

Lately, I’ve been spending a lot of time on Stack Overflow. The question and answer site is a godsend for programmers. Most problems a coder encounters have already been asked and answered by the experts on Stack Overflow. All it takes is a simple search and your problems are solved.

Unfortunately, almost all of this accumulated knowledge is inaccessible to the vast majority of the world’s population! Stack Overflow is primarily an English-language site but only 20% of the world speaks English.

Stack Overflow has sites in a few other languages (Spanish, Russian, Japanese, and Portuguese) but, unfortunately, there is far less content from a far smaller community available in each of these languages.

How can we make it better?

Recent research from Google, Facebook, and others has shown groundbreaking advancements in automatic translation powered by new machine learning techniques. This has made it possible for us to provide high quality translations of English content for users around the world (no matter what language they speak).

So we can just run the Stack Overflow archive through Google Translate and we’re done, right? Well, not quite! Although Google Translate is now powered by an advanced neural network, it is trained on general text, not technical language. But we can train our own neural machine translation model using the Stack Overflow archives based on their techniques!

This is exactly what we’re planning on doing!

But wait

If we only have our content in English, how will we know whether our machine learning model does a good job of translating it? And this fancy machine learning stuff sounds like it’s going to take a lot of time and effort to get right; are we sure there’s even demand for this content in other languages?

Enter the “Smoke and Mirrors Test”

To answer those two questions, today we’re releasing PreguntaRepuesta.com, the top-10 most popular questions from Stack Overflow translated (by a professional human translator) into Spanish.

I first heard about the concept of a Smoke and Mirrors test from Tim Ferriss. The idea is simple: to validate demand for an idea, create a fake landing page for a product, try to get customers, and see how many people actually click “buy”. If nobody does, you’ve saved yourself a lot of time and effort creating a product that nobody wanted (or that you would have been unable to effectively market). If lots of people do, you know you have a hit on your hands before you’ve even made the product!

That’s what PreguntaRepuesta.com is; it’s a way to determine whether there are lots of Spanish-speaking people searching for programming help in their native language. If I get a lot of people going to the site, it’s probably worth pursuing the neural machine translation model I’m pretty sure I could build (given enough time). If not, I’ll move on to the next idea on my list!

Now we wait

I made sure to search engine optimize the page as much as I could (I even added AMP). Hopefully this small subset of translated content will validate a need in the market!

In the meantime, we’re continuing to prototype other ideas at Roboflow. Our next release will be in the field of Augmented Reality. Follow me on Twitter to keep up to date on all of our latest developments!

Translation
Machine Learning
Programming
Artificial Intelligence
Recommended from ReadMedium