avatarPraveen Pareek

Summary

The website content discusses the top three challenges in machine learning for industry applications: lack of training data, discrepancies between training and production data, and model scalability, along with current solutions to these issues.

Abstract

Machine learning has significantly impacted various industries, yet it faces substantial challenges that need to be addressed for further advancement. The primary challenge is the scarcity of labeled training data, which is essential for machine learning models. To mitigate this, methods such as transfer learning and self-supervised representation learning are employed, allowing models to leverage pre-existing knowledge and unlabelled data. Another critical issue is the discrepancy between training data and real-world production data, which can lead to models performing well in controlled environments but failing in actual use cases. This challenge necessitates careful data collection and continuous model updates to reflect target domains accurately. Lastly, model scalability is a concern, as models need to be both efficient and compact to be deployed effectively in industrial settings. Post-training quantization is suggested as a solution to reduce model size and improve latency without significantly compromising accuracy.

Opinions

  • The author believes that transfer learning is a key solution to the lack of training data, especially in tasks like image classification.
  • The author emphasizes the importance of ensuring that training data is representative of the production environment to avoid model generalization issues.
  • Continuous model updating is seen as crucial for maintaining performance in the face of changing real-world conditions.
  • The author suggests that post-training quantization is a viable method for addressing model scalability concerns, balancing size, speed, and accuracy.
  • The author values the sharing of knowledge and resources, providing links to further reading and learning materials on the topics discussed.

What Are the Top 3 Challenges for Machine Learning in Industry

Photo by Olav Ahrens Røtne on Unsplash

Machine learning has revolutionized a lot of industries today.

It can help us recognize objects, rank things, even understand what you are saying. But it is still in its early stage and faces a lot of challenges.

Here I will list three major challenges in machine learning nowadays from research to industry and tell you what is our current solution to each of them.

Here are those key challenges:

  1. Lack of Training Data
  2. Discrepancies between the Training Data and the Production Data
  3. Model Scalability

So, we’ll go thorough each of these one by one.

At the end, I’ll list some resources for you to go through for gaining a better understanding of the solutions.

Lack of Training Data

The first one is the lack of training data. Data is at the core of any Machine Learning project.

However, it is usually very hard and expensive to obtain labeled data. How to train a model without large amounts of data is a very hot topic today.

Transfer learning is one of the methods to solve this problem. It enables the model to utilize knowledge from previously learned tasks and applies them to the new related ones.

For example, in the image classification task, very few people will train an entire convolutional neural network from scratch, which usually needs millions of labelled images. Instead it is common to use the pre-trained models and fine-tune them on the target domain.

Self-supervised representation learning is another way to solve the lack of data problem. It opens up a huge opportunity for better utilizing large amounts of unlabelled data.

Here is a very famous example in language modeling. If there is a sentence, but there are some missing words in it.

Patricia is ________ a French Novel

How can you fill in those missing words. Well, the idea is to predict the missing words by learning from the past and future knowledge.

The representation of the language is also learned during this process. Google has used this technique to understand searches better than ever before, which brings one of the biggest leaps forward in the history of search.

Discrepancies between the Training Data and the Production Data

The second challenge is that there are usually some discrepancies between your training data and production data.

Sometimes the model works well in your prototyping environment, but fails to generalize in the real world cases.

Let’s see a few examples here:

  • The model may work well in one country but fail in another due to geographical differences.
  • The model may work in winter but fail in summer due to seasonal differences.
  • The model may work well on mobile but fail on desktop due to user behavior differences So when you are developing your model.

You need to be very careful when you collect your training data. To make it as close to your target domain as possible. And keep updating your model once it is outdated.

Model Scalability

The last but not the least challenge is about model scalability. It is a big issue for a lot of projects in industry.

As a machine learning scientist, you need to make sure that the model can run fast enough and the model size is small enough, which is usually a big challenge for a lot of tasks.

One of the possible solutions is to use Post-training quantization.

Post-training quantization is a conversion technique that can reduce model size.

At the same time, it can also improve CPU and hardware accelerator latency, with little degradation in model accuracy.

Resources:

What’s next for you?

If you enjoyed this article, it would really help if you hit recommend below! Follow me on Twitter, LinkedIn, and Medium

Read my all posts/articles here: Praveen Pareek

Machine Learning
Artificial Intelligence
Industry
Technology
TensorFlow
Recommended from ReadMedium