How would you monitor the performance of a model in production?


How would you advise to monitor the performance of a model in production over time, which could be subject to data drift? Is it for example possible to see the confidence levels degrade over time, or are there beter ways to do this?


1 Like

Hi @Nasnl,

Data drift is a common issue for all machine learning models in production and is closely connected to concept drift. Data drift occurs when the data a model was trained on changes and is one of the most frequent reasons for why model performance decreases over time.

  • For instance training a model to predict if a person worked remotely or not before Covid and using it during the pandemic would be considered data drift.
  • If a model is trained to predict the weather and the sensors degrade, that is also considered data drift or for cameras, that the lighting is different
  • It can also happen if one trains on too specific data, s.a. predict all cats and dogs, but the model was only trained on one breed of cat and dog.

Addressing the problem:
Since data drift happens when the data changes from the data the model was trained on, there are a couple of ways to measure and address this:

  1. Define suitable statistics for your data and problem. Decide on how you want to compare the data you trained your model on to data that is sent to your model in production
  2. Define thresholds for when data drift has occurred and save information about what type of data triggered it
  3. Alert your system to when data drift occurs
  4. Inspect what type of data triggered the alerts and see if you can collect similar data or augment your original training data
  5. Retrain your model at regular intervals on new data

You could also monitor a model’s confidence, but there are some drawbacks. Machine learning models are commonly referred to as “being overconfident” when making predictions, so using the models confidence is not a reliable way to measure data drift. If one mitigates this with, for example model calibration, then it could definitely be used as an additional measurement, but should not be your primary metric


Thanks @markussagen ! You did add some new ideas to our approach with your feedback! Really helpful!