Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

alized repository, known as a data catalog, organizations can manage and maintain large volumes of data in a scalable and efficient manner. The data catalog contains essential information such as data source, data type, data format, owner, usage, and lineage, aiding in the organization and governance of data assets.</li></ol><p id="f44c">To advance the infrastructure to the next level, it is essential to introduce automated testing alongside version control. This involves implementing practices like unit tests, integration tests, or regression tests. These testing methodologies facilitate faster deployments and enhance reliability by ensuring that code changes do not introduce errors or bugs.</p><p id="7bd2">Once these changes are implemented, the data collection and deployment processes can be repeated. However, it is crucial to establish a robust monitoring system. Although Microsoft briefly mentions the need for monitoring and states there is “limited feedback on how well a model performs in production,” specific details are not provided regarding its implementation and functionality.</p><figure id="ac53"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*NK82xwEJ0S7hEJOv0hgSpw.png"><figcaption>Image by Author with @MidJourney</figcaption></figure><h1 id="0734">Reproducible</h1><p id="1b95">There are two crucial reasons why reproducibility plays a vital role: troubleshooting and collaboration. Imagine a scenario where the performance of a recently deployed model starts to deteriorate, leading to inaccurate predictions. In such cases, it becomes essential to maintain a record of previous data and model versions to roll back to a known working state until the root cause of the issue is identified.</p><p id="fdf8">Moreover, reproducibility facilitates better collaboration among team members by allowing them to understand each other’s work and build upon it. This collaborative approach and knowledge sharing can accelerate innovation and result in the development of better models.</p><p id="add7">To achieve reproducibility, the architecture needs to be elevated in the following four ways:</p><ol><li><b>Automated training pipeline:</b> An automated training pipeline manages the end-to-end process of training models, starting from data preparation to model evaluation.</li><li><b>Metadata store:</b> A metadata store, in the form of a database, enables tracking and management of essential metadata such as data sources, model configurations, hyperparameters, training runs, evaluation metrics, and experimental data.</li><li><b>Model registry:</b> A model registry serves as a repository for storing ML models, their versions, and the artifacts necessary for deployment. This facilitates easy retrieval of specific versions when needed.</li><li><b>Feature store:</b> A feature store provides data scientists and machine learning engineers with a centralized location for storing, managing, and serving features. It enables more efficient development, testing, and deployment of machine learning models. Additionally, the feature store tracks the evolution of features over time and allows for preprocessing and transformation of features as required.</li></ol><p id="db53">At this stage, a monitoring service is available, providing real-time feedback on the model’s performance. However, both Microsoft and Google do not provide further details regarding the specifics of the monitoring service.</p><h1 id="7ffa">Automated</h1><p id="232e">At this automation level, data scientists can efficiently explore new ideas in feature engineering, model architecture, and hyperparameters through the automation of the machine learning pipeline, encompassing building, testing, and deployment. To accomplish this, Microsoft suggests the incorporation of two additional components:</p><ol><li><b>CI/CD: </b>Continuous Integration (CI) ensures the integration of code changes from different team members into a shared repository. On the other hand, Continuous Deployment (CD) automates the deployment of validated code to production environments. This CI/CD approach enables the rapid deployment of model updates, improvements, and bug fixes, streamlining the development process.</li><li><b>A/B testing of models:</b> This model validation method involves comparing predictions and user feedback between an existing model and a candidate model to determine which one performs better. A/B testing allows for a systematic evaluation of different models, enabling data-driven decision-making when selecting the most effective model for deployment.</li></ol><p id="0a15">By incorporating CI/CD and A/B testing, organizations can enhance the efficiency and reliability of their ML infrastructure, accelerating the development and deployment of models while ensuring the continuous improvement of their ML systems.</p><figure id="2f9a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*xNxgPhPx1qR5eHJUDlX8CA.png"><figcaption>Image by Author with @MidJourney</figcaption></figure><h1 id="3229">Continuously improved</h1><p id="d051">At this stage, the model undergoes automatic retraining triggered by the monitoring system. This retraining process, also known as continuous learning, serves several objectives:</p><ol><li><b>Combat sudden data drifts:</b> Continuous learning ensures the model remains effective even when faced with unexpected changes in the data. By adapting to these drifts, the model can maintain accuracy and reliability over time.</li><li><b>Adapt to rare events:</b> Events like Black Friday, which exhibit unique patterns and trends, may deviate significantly from the norm. Continuous learning enables the model to adapt and capture these rare events, ensuring optimal performance during such periods.</li><li><b>Overcome the cold start problem:</b> The cold start problem occurs when the model needs to make predictions for new users who lack historical data. Continuous learning helps the model overcome this challenge by incorporating new user data and effectively making predictions for previously unseen instances.</li></ol><p id="e57d">According to Microsoft and Google, continuous model retraining enhances the model’s robustness and adaptability to changes in the data. This approach enables the model to consistently perform at its peak, ensuring its effectiveness and relevance over time.</p><h1 id="e8ae">Push for automation</h1><p id="f89a">In the cloud computing market, Microsoft and Google are prominent players, with Azure holding a 22% market share and Google at 10%. Their offerings encompass a wide range of services, including computing, storage, and development tools, all of which are essential components for constructing advanced ML infrastructure.</p><p id="3328">As with any business, their primary objective is to generate revenue by selling these services. Consequently, their blogs emphasize the importance of advancement and automation. However, it is crucial to note that a higher level of maturity does not guarantee superior results for every business. The optimal solution lies in aligning with your company’s specific needs and selecting the right tech stack.</p><p

Options

id="624e">While maturity levels can serve as a useful indicator of your current advancement, it is important not to blindly adhere to them. Microsoft and Google’s primary incentive is to promote and sell their services. For instance, their emphasis on automated retraining may not always be necessary or beneficial. Retraining should occur when needed, and other factors become more critical for your infrastructure, such as a reliable monitoring system and an effective root cause analysis process.</p><p id="b9b0">Ultimately, the decision-making process should prioritize your business’s unique requirements and consider the larger picture beyond the push for automation advocated by Microsoft and Google.</p><figure id="88d2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*eiuJ0vh7tPv93OP7rS299A.png"><figcaption>Image by Author with @MidJourney</figcaption></figure><h1 id="f73c">Monitoring should start from the manual level</h1><p id="a31c">In the described maturity levels, a limited monitoring system is introduced at level 2. However, in reality, monitoring should be initiated as soon as business decisions are made based on the model’s output, irrespective of the maturity level. This early monitoring practice enables risk reduction and provides insights into how the model aligns with your business goals.</p><p id="c5ee">At the initial stages, monitoring can begin with a simple comparison of the model’s predictions to the actual values. This straightforward assessment serves as a baseline for evaluating the model’s performance and serves as a starting point for further analysis when the model exhibits failures.</p><p id="ccc9">Additionally, it is essential to consider evaluating the impact and effectiveness of data science efforts, including measuring the return on investment (ROI). Assessing the value that data science techniques and algorithms bring to the table is crucial. Understanding the effectiveness of these efforts in generating business value empowers you to make informed decisions regarding resource allocation and future investments. Evaluating ROI provides valuable insights and information for improved decision-making.</p><p id="8a6e">As the infrastructure evolves, the monitoring system can become more sophisticated, incorporating additional features and capabilities. However, it is important to acknowledge the significance of implementing a basic monitoring setup at the initial level of maturity. By prioritizing monitoring from the outset, you establish a strong foundation for ensuring model performance and aligning it with your business objectives.</p><h1 id="1895">Risk of retraining</h1><p id="669a">While level 5 highlights the benefits of automatic retraining in production, it is crucial to consider the associated risks before incorporating it into your infrastructure. Here are some key points to ponder:</p><ul><li>Retraining on delayed data Certain real-world scenarios, such as loan-default prediction, may involve delayed labels that arrive months or even years later. Retraining the model using outdated data might not accurately reflect the current reality, raising concerns about its effectiveness.</li><li>Failure to determine the root cause of the problem A decline in the model’s performance does not always signify a need for more data. Multiple factors can contribute to model failure, including changes in downstream business processes, training-serving skew, or data leakage. It is crucial to investigate and identify the underlying issue before deciding whether retraining is necessary.</li><li>Higher risk of failure Retraining introduces an amplified risk of model failure. With increased update frequency, the complexity of the infrastructure grows, providing more opportunities for potential issues. Any undetected problems in data collection or preprocessing can propagate to the retrained model, leading to a model trained on flawed data.</li><li>Higher costs Retraining is not a cost-free process. It involves expenses related to storing and validating the retraining data, as well as the computational resources required to execute the retraining process. Additionally, testing a new model to determine its performance compared to the existing one adds to the overall costs.</li></ul><p id="7391">Considering these risks is essential when deciding whether to implement automatic retraining in your ML infrastructure. Careful evaluation and risk assessment will help you make informed decisions that balance the potential benefits and drawbacks of incorporating retraining into your workflow.</p><figure id="a438"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*qbzGlT4upzImL3cWZ0A8OA.png"><figcaption>Image by Author with @MidJourney</figcaption></figure><h1 id="8617">Summary</h1><p id="f1f2">In this blog post, we have delved into the world of ML infrastructure and explored the five MLOps maturity levels based on industry best practices from Google and Microsoft. From manual deployment to advanced automation, each level presents its own set of benefits. However, it is important to approach these practices with caution and tailor them to your company’s unique needs and requirements.</p><p id="5611">Building a sustainable and repeatable ML infrastructure is no easy feat, considering the complexity of ML systems. By understanding the different maturity levels and their advantages, you can make informed decisions about the evolution of your own infrastructure. Remember, blindly adopting these practices may not yield the desired results. Instead, customization and adaptation based on your specific circumstances are key to success.</p><p id="7496">As you navigate the challenges of ML infrastructure, keep in mind that continuous improvement and alignment with your business goals are paramount. By striking the right balance between industry best practices and your organization’s needs, you can lay the foundation for a robust and efficient ML system that drives innovation and delivers tangible results.</p><p id="6e58"><b>If you made it this far, thank you for reading my story!</b></p><p id="cdea"><a href="https://medium.com/@ulriktpedersen/subscribe"><b>Subscribe for free</b></a><b> to get notified when I publish a new story!</b></p><p id="eb11"><b>Want unlimited access to my stories and the rest of Medium? <a href="https://medium.com/@ulriktpedersen/membership">Become a member</a>!</b></p><p id="e07a"><b>…and I would love your feedback!</b></p><h2 id="e8e9">BECOME a WRITER at MLearning.ai // invisible ML // Detect AI img</h2><div id="6cd7" class="link-block"> <a href="https://readmedium.com/mlearning-ai-submission-suggestions-b51e2b130bfb"> <div> <div> <h2>Mlearning.ai Submission Suggestions</h2> <div><h3>How to become a writer on Mlearning.ai</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*6xCb1sNpjadaSBuVLPTFQQ.png)"></div> </div> </div> </a> </div></article></body>

5 Levels of Machine Learning Maturity: MLOps Explained

Exploring the stages of maturity and industry best practices

The establishment of a robust infrastructure for machine learning (ML) systems is of utmost importance. It is crucial to ensure the organization and reliability of ML application development and deployment. However, the infrastructure needs vary for each company, depending on factors such as the number of ML applications, deployment speed requirements, and request handling capacity.

For instance, if a company has only one model in production, the deployment process can be managed manually. On the other hand, companies like Netflix or Uber, with numerous models in production, necessitate highly specialized infrastructure to support their operations. You might now wonder where your company fits within this spectrum.

To assist with this, Google and Microsoft have shared MLOps maturity levels, which describe the progression and sophistication of ML infrastructure based on industry best practices. This blog post aims to synthesize and combine the best elements from both frameworks. First, we will examine five maturity levels, illustrating the evolution from manual processes to advanced automated infrastructures. In the final section, we will argue that while some points presented by Microsoft and Google should not be followed blindly, they can be adjusted to meet your specific needs. This will enable you to gain a better understanding of your infrastructure’s current state and identify potential areas for improvement.

Now, let’s delve into the topic at hand!

What is MLOps?

MLOps refers to a collection of practices designed to establish a standardized and repeatable process for managing the entire ML lifecycle, encompassing data preparation, model training, deployment, and monitoring. It draws inspiration from the widely adopted DevOps practices in software engineering, which aim to provide teams with a rapid and continuously iterative approach to shipping software applications.

However, MLOps differs from DevOps in several ways:

Multidisciplinary Team: MLOps requires a team with diverse skill sets, including data engineers responsible for data collection and storage, data scientists who develop the models, machine learning engineers (MLE) to deploy the models, and software engineers who integrate them with the product.
Experimental Nature of Data Science: Data science is inherently experimental, allowing ongoing improvement through the exploration of different models, data analysis, training techniques, and hyperparameter configurations. The MLOps infrastructure should include mechanisms for tracking and evaluating both successful and unsuccessful approaches.
Silent Model Failures: Even when a model is operational in production, it can still fail due to changes in incoming data. This phenomenon, known as silent model failure, is caused by data and concept drift. Therefore, ML infrastructure requires a monitoring system to continually assess the model’s performance and data to mitigate this issue.

Now, let us explore the different maturity levels of MLOps infrastructures!

Manual

At this level, the processes of data processing, experimentation, and model deployment are entirely manual. Microsoft refers to this level as ‘No MLOps’ because the ML lifecycle lacks repeatability and automation.

The entire workflow heavily relies on skilled data scientists, with some support from a data engineer for data preparation and a software engineer for integration with the product or business processes, if necessary.

This approach proves effective in the following scenarios:

Early-stage start-ups and proof of concept projects: These situations prioritize experimentation, with limited resources. The main focus is on developing and deploying ML models before scaling up operations.
Small-scale ML applications: For ML applications with a narrow scope or a small user base, such as a small online fashion store, the manual approach can be sufficient. With minimal data dependencies and real-time requirements, data scientists can handle data processing, experimentation, and deployment manually.
Ad hoc ML tasks: In specific scenarios like marketing campaigns, one-time ML tasks or analyses may not require full implementation of MLOps.

However, according to Google and Microsoft, this approach also has several limitations, including:

Lack of a monitoring system: Without visibility into the model’s performance, any degradation can have a negative business impact. Additionally, the absence of post-deployment data science hinders understanding the model’s behavior in production.
Infrequent retraining of production models: Without periodic adaptation to the latest trends or patterns, the models become less effective over time.
Painful and infrequent releases: Since the process is manual, model releases occur only a few times per year, resulting in slower iteration cycles.
Lack of centralized tracking of model performance: This makes it challenging to compare the performance of different models, reproduce results, or update models efficiently.
Limited documentation and absence of versioning: These factors present challenges, including the risk of unintended changes to the code, limited ability to rollback to a working version, and difficulties in achieving repeatability.

By recognizing these limitations, organizations can identify the need for improvements and advancements in their ML infrastructure.

Repeatable

Next, we proceed to incorporate the DevOps aspect into the infrastructure by transforming experiments into source code and storing them in a source repository, utilizing a version control system like Git.

In addition, Microsoft suggests implementing the following changes to the data collection process:

Data pipeline: This enables the extraction of data from various sources and combines them through operations such as cleaning, aggregating, or filtering. The inclusion of a data pipeline enhances the scalability, efficiency, and accuracy of the infrastructure compared to manual processes.
Data catalog: By establishing a centralized repository, known as a data catalog, organizations can manage and maintain large volumes of data in a scalable and efficient manner. The data catalog contains essential information such as data source, data type, data format, owner, usage, and lineage, aiding in the organization and governance of data assets.

To advance the infrastructure to the next level, it is essential to introduce automated testing alongside version control. This involves implementing practices like unit tests, integration tests, or regression tests. These testing methodologies facilitate faster deployments and enhance reliability by ensuring that code changes do not introduce errors or bugs.

Once these changes are implemented, the data collection and deployment processes can be repeated. However, it is crucial to establish a robust monitoring system. Although Microsoft briefly mentions the need for monitoring and states there is “limited feedback on how well a model performs in production,” specific details are not provided regarding its implementation and functionality.

Reproducible

There are two crucial reasons why reproducibility plays a vital role: troubleshooting and collaboration. Imagine a scenario where the performance of a recently deployed model starts to deteriorate, leading to inaccurate predictions. In such cases, it becomes essential to maintain a record of previous data and model versions to roll back to a known working state until the root cause of the issue is identified.

Moreover, reproducibility facilitates better collaboration among team members by allowing them to understand each other’s work and build upon it. This collaborative approach and knowledge sharing can accelerate innovation and result in the development of better models.

To achieve reproducibility, the architecture needs to be elevated in the following four ways:

Automated training pipeline: An automated training pipeline manages the end-to-end process of training models, starting from data preparation to model evaluation.
Metadata store: A metadata store, in the form of a database, enables tracking and management of essential metadata such as data sources, model configurations, hyperparameters, training runs, evaluation metrics, and experimental data.
Model registry: A model registry serves as a repository for storing ML models, their versions, and the artifacts necessary for deployment. This facilitates easy retrieval of specific versions when needed.
Feature store: A feature store provides data scientists and machine learning engineers with a centralized location for storing, managing, and serving features. It enables more efficient development, testing, and deployment of machine learning models. Additionally, the feature store tracks the evolution of features over time and allows for preprocessing and transformation of features as required.

At this stage, a monitoring service is available, providing real-time feedback on the model’s performance. However, both Microsoft and Google do not provide further details regarding the specifics of the monitoring service.

Automated

At this automation level, data scientists can efficiently explore new ideas in feature engineering, model architecture, and hyperparameters through the automation of the machine learning pipeline, encompassing building, testing, and deployment. To accomplish this, Microsoft suggests the incorporation of two additional components:

CI/CD: Continuous Integration (CI) ensures the integration of code changes from different team members into a shared repository. On the other hand, Continuous Deployment (CD) automates the deployment of validated code to production environments. This CI/CD approach enables the rapid deployment of model updates, improvements, and bug fixes, streamlining the development process.
A/B testing of models: This model validation method involves comparing predictions and user feedback between an existing model and a candidate model to determine which one performs better. A/B testing allows for a systematic evaluation of different models, enabling data-driven decision-making when selecting the most effective model for deployment.

By incorporating CI/CD and A/B testing, organizations can enhance the efficiency and reliability of their ML infrastructure, accelerating the development and deployment of models while ensuring the continuous improvement of their ML systems.

Continuously improved

At this stage, the model undergoes automatic retraining triggered by the monitoring system. This retraining process, also known as continuous learning, serves several objectives:

Combat sudden data drifts: Continuous learning ensures the model remains effective even when faced with unexpected changes in the data. By adapting to these drifts, the model can maintain accuracy and reliability over time.
Adapt to rare events: Events like Black Friday, which exhibit unique patterns and trends, may deviate significantly from the norm. Continuous learning enables the model to adapt and capture these rare events, ensuring optimal performance during such periods.
Overcome the cold start problem: The cold start problem occurs when the model needs to make predictions for new users who lack historical data. Continuous learning helps the model overcome this challenge by incorporating new user data and effectively making predictions for previously unseen instances.

According to Microsoft and Google, continuous model retraining enhances the model’s robustness and adaptability to changes in the data. This approach enables the model to consistently perform at its peak, ensuring its effectiveness and relevance over time.

Push for automation

In the cloud computing market, Microsoft and Google are prominent players, with Azure holding a 22% market share and Google at 10%. Their offerings encompass a wide range of services, including computing, storage, and development tools, all of which are essential components for constructing advanced ML infrastructure.

As with any business, their primary objective is to generate revenue by selling these services. Consequently, their blogs emphasize the importance of advancement and automation. However, it is crucial to note that a higher level of maturity does not guarantee superior results for every business. The optimal solution lies in aligning with your company’s specific needs and selecting the right tech stack.

While maturity levels can serve as a useful indicator of your current advancement, it is important not to blindly adhere to them. Microsoft and Google’s primary incentive is to promote and sell their services. For instance, their emphasis on automated retraining may not always be necessary or beneficial. Retraining should occur when needed, and other factors become more critical for your infrastructure, such as a reliable monitoring system and an effective root cause analysis process.

Ultimately, the decision-making process should prioritize your business’s unique requirements and consider the larger picture beyond the push for automation advocated by Microsoft and Google.

Monitoring should start from the manual level

In the described maturity levels, a limited monitoring system is introduced at level 2. However, in reality, monitoring should be initiated as soon as business decisions are made based on the model’s output, irrespective of the maturity level. This early monitoring practice enables risk reduction and provides insights into how the model aligns with your business goals.

At the initial stages, monitoring can begin with a simple comparison of the model’s predictions to the actual values. This straightforward assessment serves as a baseline for evaluating the model’s performance and serves as a starting point for further analysis when the model exhibits failures.

Additionally, it is essential to consider evaluating the impact and effectiveness of data science efforts, including measuring the return on investment (ROI). Assessing the value that data science techniques and algorithms bring to the table is crucial. Understanding the effectiveness of these efforts in generating business value empowers you to make informed decisions regarding resource allocation and future investments. Evaluating ROI provides valuable insights and information for improved decision-making.

As the infrastructure evolves, the monitoring system can become more sophisticated, incorporating additional features and capabilities. However, it is important to acknowledge the significance of implementing a basic monitoring setup at the initial level of maturity. By prioritizing monitoring from the outset, you establish a strong foundation for ensuring model performance and aligning it with your business objectives.

Risk of retraining

While level 5 highlights the benefits of automatic retraining in production, it is crucial to consider the associated risks before incorporating it into your infrastructure. Here are some key points to ponder:

Retraining on delayed data Certain real-world scenarios, such as loan-default prediction, may involve delayed labels that arrive months or even years later. Retraining the model using outdated data might not accurately reflect the current reality, raising concerns about its effectiveness.
Failure to determine the root cause of the problem A decline in the model’s performance does not always signify a need for more data. Multiple factors can contribute to model failure, including changes in downstream business processes, training-serving skew, or data leakage. It is crucial to investigate and identify the underlying issue before deciding whether retraining is necessary.
Higher risk of failure Retraining introduces an amplified risk of model failure. With increased update frequency, the complexity of the infrastructure grows, providing more opportunities for potential issues. Any undetected problems in data collection or preprocessing can propagate to the retrained model, leading to a model trained on flawed data.
Higher costs Retraining is not a cost-free process. It involves expenses related to storing and validating the retraining data, as well as the computational resources required to execute the retraining process. Additionally, testing a new model to determine its performance compared to the existing one adds to the overall costs.

Considering these risks is essential when deciding whether to implement automatic retraining in your ML infrastructure. Careful evaluation and risk assessment will help you make informed decisions that balance the potential benefits and drawbacks of incorporating retraining into your workflow.

Summary

In this blog post, we have delved into the world of ML infrastructure and explored the five MLOps maturity levels based on industry best practices from Google and Microsoft. From manual deployment to advanced automation, each level presents its own set of benefits. However, it is important to approach these practices with caution and tailor them to your company’s unique needs and requirements.

Building a sustainable and repeatable ML infrastructure is no easy feat, considering the complexity of ML systems. By understanding the different maturity levels and their advantages, you can make informed decisions about the evolution of your own infrastructure. Remember, blindly adopting these practices may not yield the desired results. Instead, customization and adaptation based on your specific circumstances are key to success.

As you navigate the challenges of ML infrastructure, keep in mind that continuous improvement and alignment with your business goals are paramount. By striking the right balance between industry best practices and your organization’s needs, you can lay the foundation for a robust and efficient ML system that drives innovation and delivers tangible results.

If you made it this far, thank you for reading my story!

Subscribe for free to get notified when I publish a new story!

Want unlimited access to my stories and the rest of Medium? Become a member!