Best Practices For Monitoring Machine Learning Models In Production

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2339

Abstract

/b>: make sure you always store trained and versioned models hyperparameters. This will help traceability, troubleshooting and rollbacks when needed</li><li><b>Shadow deployment</b>: before deploying a new model, deploy it in shadow mode, so that it can be tracked along with the current model, log its predictions and performances.</li><li><b>Check the performance</b>: the model will degrade over time, it’s inevitable, so set up tools which can help detecting it.</li></ul><h1 id="b127">Practices for output monitoring</h1><ul><li><b>Track absurd outputs</b>: always track wrong predictions with high confidence, given a specific set of inputs.</li><li><b>Make sure you exploit all the possible metrics to detect model and concept drift</b>: more on the previous article</li></ul><h1 id="85af">Alerting</h1><p id="bcee">This is another important aspect: if you’re not alerting in case something goes wrong, why are you even monitoring?</p><ul><li><b>Agree with everyone on the media</b>: make sure to choose the proper way to send alerts, like slack, mattermost, emails, etc.</li><li><b>Divide et impera</b>: make sure certain alerts goes to certain teams or people. For example: data alerts should go to the DevOps / Data Engineers.</li><li><b>Don’t send alerts for everything</b>: choose what actually matters to the business and to the application. Too many alerts will create a hell of noise.</li></ul><h1 id="a99e">Logging</h1><p id="3911">Log as much as possible: your limit is the sky (or the storage). To make sure you have insights for anything that might go wrong, setting up a good logging system is crucial.</p><p id="2d76">Here’s some things that might be worth monitoring:</p><ul><li>Data events during the pipeline: start and end timings, job failures, etc.</li><li>Production data: always useful when retraining.</li><li>Model metadata: versions and hyperparameters.</li><li>Every prediction result.</li><li>Performances: timings, hardware usage.</li></ul><h1 id="9007">No, it doesn’t end with monitoring</h1><p id="8073">Now that you have an entire infrastructure around your Machine Learning model, you will always be ready to keep providing actual business value.</p><p id="d431"><b>Other articles you might find useful:</b></p><div id="90b9" class="link-block"> <a href="https://readmedium.com/introduction

Options

-to-mlops-d616c6ba8669"> <div> <div> <h2>Introduction To MLOps</h2> <div><h3>What’s missing in most of online courses</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*3lWsrIoyUysTZaBl.png)"></div> </div> </div> </a> </div><div id="dff9" class="link-block"> <a href="https://readmedium.com/strategies-to-deploy-your-machine-learning-models-204e8664032"> <div> <div> <h2>Strategies To Deploy Your Machine Learning Models</h2> <div><h3>Blue-Green, Canary and A/B Testing deployments</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*bWKq5oNia5Tzjoxc)"></div> </div> </div> </a> </div><div id="1207" class="link-block"> <a href="https://readmedium.com/a-comprehensive-guide-on-how-to-monitor-your-models-in-production-c069a8431723"> <div> <div> <h2>A Comprehensive Guide on How to Monitor Your Models in Production</h2> <div><h3>An overview of what could go wrong, and how to fix it.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*fj5wRJVK6WSLe_BQ)"></div> </div> </div> </a> </div><p id="9f7e"><i>If you liked the post, consider following me on <a href="https://alessandroai.medium.com/"><b>Medium</b></a> and my website: <a href="https://www.alessandroai.com/"><b>alessandroai.com</b></a>.</i></p><p id="7f10"><i>You can also support my work directly and get unlimited access by becoming a Medium member through my referral link <a href="https://alessandroai.medium.com/membership"><b>here</b></a>!</i></p><p id="9a33"><i>Originally published at <a href="https://www.alessandroai.com/best-practice/">https://www.alessandroai.com</a> on January 3, 2022.</i></p></article></body>

Best Practices For Monitoring Machine Learning Models In Production

In A Comprehensive Guide on How To Monitor Your Models In Production, we saw how deploying a Machine Learning model isn’t the last step, and why monitoring is essential. It’s essential because through monitoring, you’ll understand if your model is performing as expected, or it needs to be trained again, or the entire application needs to be redefined, more on the article above.

We explored how to monitor the three most important aspect of a Machine Learning application: data, model and output.

We’ll now go over some of the best practices teams can adopt in order to monitor their models effectively.

General best practices

Before diving into the best practices while monitoring data, model and output, let’s go over some general tips.

Monitoring doesn’t start with deployment: start when the experimentation steps start, monitor your experiments, your logs, your ideas and troubleshooting.

More is not better, for the tools: don’t use dozens of tools, it can get messy to work with. Use as less tools as you can to get the same job done.

More is better, for the people: don’t give all the power to one person, decentralize your team, everyone handles a single task, and no one will be overwhelmed.

Practices for model monitoring

Always setup a metadata storage: make sure you always store trained and versioned models hyperparameters. This will help traceability, troubleshooting and rollbacks when needed

Shadow deployment: before deploying a new model, deploy it in shadow mode, so that it can be tracked along with the current model, log its predictions and performances.

Check the performance: the model will degrade over time, it’s inevitable, so set up tools which can help detecting it.

Alerting

This is another important aspect: if you’re not alerting in case something goes wrong, why are you even monitoring?

Agree with everyone on the media: make sure to choose the proper way to send alerts, like slack, mattermost, emails, etc.

Divide et impera: make sure certain alerts goes to certain teams or people. For example: data alerts should go to the DevOps / Data Engineers.

Don’t send alerts for everything: choose what actually matters to the business and to the application. Too many alerts will create a hell of noise.

Logging

Log as much as possible: your limit is the sky (or the storage). To make sure you have insights for anything that might go wrong, setting up a good logging system is crucial.

Here’s some things that might be worth monitoring:

Data events during the pipeline: start and end timings, job failures, etc.

Production data: always useful when retraining.

Model metadata: versions and hyperparameters.

Every prediction result.

Performances: timings, hardware usage.

Best Practices For Monitoring Machine Learning Models In Production

General best practices

Practices for data monitoring

Practices for model monitoring

Practices for output monitoring

Alerting

Logging

No, it doesn’t end with monitoring

Introduction To MLOps

What’s missing in most of online courses

Strategies To Deploy Your Machine Learning Models

Blue-Green, Canary and A/B Testing deployments

A Comprehensive Guide on How to Monitor Your Models in Production

An overview of what could go wrong, and how to fix it.