The Machine Learning Process

Summary

The machine learning process encompasses four key stages: problem framing, data analysis, model building, and application.

Abstract

The machine learning process is a systematic approach that begins with defining project goals and the problem at hand, such as classifying emails or predicting loan outcomes. The second stage involves data preparation, including visualization, handling missing data, feature engineering, and partitioning datasets. Model building, the third stage, requires selecting an appropriate machine learning tool, training the model with data, and fine-tuning to prevent overfitting. The final stage is application, where the model is deployed to improve customer experiences or productivity and is continuously evaluated and retrained to maintain its performance in a production setting.

Opinions

The article emphasizes the importance of clearly defining the goals of a machine learning project to align the problem with the right type of model.
It suggests that data refinement is critical, involving various techniques to ensure the quality and usability of the dataset.
The choice of machine learning model is presented as a crucial decision that should match the data and the desired outcome.
The article advocates for the use of hyperparameter tuning and cross-validation to optimize model performance and avoid overfitting.
It highlights the necessity of evaluating the model in a production environment and using real-world feedback to improve the model iteratively.
The article concludes that machine learning has the potential to optimize business processes and enhance customer experiences across various industries.

1. Problem Framing

Define your project goals. What do you want to find out? Do you have the data to analyze?

This is where you decide what kind of problem are you trying to solve e.g. model to classify emails as spam or not spam, model to classify tumor cells as malignant or benign, model to improve customer experience by routing calls into different categories so that calls can be answered by personnel with the right expertise, model to predict if a loan will charge off after the duration of the loan, model to predict price of a house based on different features or predictors, and so on.

2. Data Analysis

Collect and refine your data. Prepare a repository to store your data

This is where you handle the data available for building the model. It includes data visualization of features, handling missing data, handling categorical data, encoding class labels, normalization and standardization of features, feature engineering, dimensionality reduction, data partitioning into training, validation and testing sets, etc.

3. Model Building

Pick the machine learning tool that matches your data and desired outcome. Choose between an automated process, a graphical editor, or code your own model. Train the model with available data.

This is where you select the model that you would like to use, e.g. linear regression, logistic regression, KNN, SVM, k-means, monte carlo simulation, time series analysis, etc. The data set has to be divided into training, validation, and test sets. Hyperparameter tuning is used to fine-tune the model in order to prevent overfitting. Cross-validation is performed to ensure the model performs well on the validation set. After fine-tuning model parameters, the model is applied to the test data set. The model’s performance on the test data set is approximately equal to what would be expected when the model is used for making predictions on unseen data.

4. Application

Score your model to generate predictions. Make your model available for production. Retrain your model as needed.

In this stage, the final machine learning model is put into production to start improving the customer experience or increasing productivity or deciding if a bank should approve credit to a borrower, etc. The model is evaluated in a production setting in order to assess its performance. This can be done by comparing the performance of the machine learning solution against a baseline or control solution using methods such as A/B testing. Any mistakes encountered when transforming from an experimental model to its actual performance on the production line has to be analyzed. This can then be used in fine-tuning the original model.

In summary, we have discussed the main stages of a machine learning process. Every business can harness the power of machine learning to optimize production or improve the customer experience.

Thanks for reading.

Diving Into the ML Process | Towards AI

The Machine Learning Process

1. Problem Framing

2. Data Analysis

3. Model Building

4. Application