A Basic Understanding of ML Ops
Want to understand the basics of ML Ops, then read on…
In this post, you will learn what ML Ops is, the difference between Dev Ops and ML Ops, the benefits of ML Ops, ML Ops components, and finally, the different maturity levels of ML Ops.
What is ML Ops?
ML Ops or Machine Learning Operations is an engineering practice and culture that combines machine learning data management, model development, experimenting with different hyperparameters, model deployment, applies automation, traceability, and monitoring at every step by building a Machine Learning pipeline.
It is a collaboration between Data Scientists, Data Engineers, and Operation Professionals to ensure a repeatable, traceable, and automated high-quality machine learning pipeline that is easy to create, maintain, deploy, and monitor.

Changing anything in model or data changes everything for machine learning and is referred to as CACE- Changing Anything Changes Everything.

Machine learning has the following basic steps.
Data
- Data extraction from different sources like cloud, database, or local hard disk and different formats such as Spark, HDFS, CSV, png, or jpg files.
- Data exploratory analysis to perform data profiling, which helps identify relevant and important features.
- Data cleaning or preprocessing to identify the outliers, missing data, or removing invalid data or irrelevant data.
Model
- Model training with different architecture and hyperparameters
- Model evaluation based on performance criteria
- Model validation to check if the model meets the basic performance threshold based on the use case
- Model deployment/serving at different locations or different environments
- Monitoring the model performance and retrain with new or additional data when the model's performance goes down.
ML Ops is built on top of Dev Ops, but there are few key differences between them.
Difference between Dev Ops and ML Ops
Dev Ops is a practice of developing, operating and maintaining software solutions that aim at shortening the development lifecycle, greater velocity to deploy the code to production with higher confidence of releases being more dependable. Ml Ops practice needs to automate the data flow, model development, evaluation, deployment, and monitoring.
Dev Ops needs only to handle the code versioning, but ML Ops needs to handle both the data and model code versioning.
Unlike Dev Ops, ML ops have to deal with lots of experimentation and have traceability and compare metrics and parameters for all the different experimentation.
ML Ops also need to monitor the model performance in Production constantly. Whenever the model's performance degrades beyond a certain threshold, then the model needs to be retrained with new data. The model needs to be evaluated, and only if the performance is better, then deploy it to production.
Dev Ops testing involves unit tests and integration tests to validate the functionality of the software. In contrast, ML Ops testing includes model training, evaluation, and validation against basic performance criteria in terms of accuracy, precision, and inference time.
ML Ops has Continuous Integration(CI) for data and models, Continuous Training(CT) of models, and then Continous Deployment(CD) of the models to Production at different locations. In contrast, Dev ops only have CI/CD, Continous Integration(CI) of the code using unit test and integration tests to address the software's functionality, and Continous Deployment(CD) of the software to Production.
Benefits of ML Ops
ML Ops helps
- Effectively manage the full ML lifecycle.
- Creates a Repeatable and Reusable ML Workflow for consistent model training, deployment, and maintenance.
- Rapid Innovation can be made easy and faster by building repeatable workflows to train, evaluate, deploy, and monitor different models.
- Track different versions of model and data to enable auditing
- Easy Deployment to production with high precision
Key components of ML Ops
- Source Control: All model codes, trained model, and visualizations are version controlled
- Continuous Integration and Continuous Deployment pipeline: Any change in the model code or hyperparameters or any change in the data triggers an automated build process for the model. Unit tests are executed, docker images are created and uploaded to the container registry.
- Continous Training Pipeline is triggered by any change in the data or features or availability of a new implementation of the model or change in hyperparameters of the model.
- Feature store: A repository of features from different data sources. It helps with the reproducibility of datasets as the consistent features can be used between training and inference.
- Model registry: Manages the model metadata like details about the model architecture, model hyperparameters, when, who ran the model and how long the model ran model’s accuracy, and precision, the required libraries being the key metadata information in a model registry
- Prediction Services: A new model is served at the target locations whenever a new evaluated version of the trained model is available
- ML Pipeline Orchestrator: Connects the different components of ML Ops as a system. The orchestrator runs the entire ML pipeline in a sequence and automatically transitions from one step to another based on the defined conditions.

Levels of ML Ops Maturity
ML Ops maturity level is defined by the level of automation available on the data and machine learning model pipelines.
ML Ops objective is to build completely automated data and model pipelines for all of the machine learning workflow steps without any manual intervention.
Maturity Level 0:
- All ML processes for building and deploying ML models are manual. Every step is executed manually, and the transition to the next process is also manual.
- As the process is manual, there are fewer deployments to production.
- No active performance monitoring of models deployed to production
Maturity Level 1:
- In Level 1 maturity, we perform continuous training of the model by automating the ML pipeline.
- Enables continuous delivery of model for prediction with either a change in the data or change in the model
- Code is modularized and version-controlled enabling reproducibility between development and production environment.
Maturity Level 2:
- Continuous training of the model by automating the ML pipeline
- Has automated CI/CD system for rapid innovation and experimentation around feature engineering, model architecture, and hyperparameters
- As the process for building the models, testing and evaluation, and deploying the new pipeline components are fully automated, daily to hourly deployments across multiple locations are enabled.
- Live-Monitoring of the models in production can trigger the re-training of the model if the performance of the model degrades.
Conclusion:
Ml Ops is a discipline that interweaves Machine Learning, Data Engineering, and Dev Ops to build automated ML pipelines for Continuous Training and CI/CD to manage the full ML lifecycle effectively. ML Ops help create repeatable and reusable ML Workflow to ensure rapid innovation, consistent model training, maintenance, and deployment to production.