A model registry is arguably the most overlooked component of most MLOps architectures. You may not formally track every model you produce, along with versions and metadata, even if your team has set up automated pipelines, a feature store, and a model serving platform.
We’ve included a model registry in our Open MLOps architecture. Here’s why you should use one too.
#1 Easier automation
Without a model registry, your model serving tool won’t automatically make requests like “Give me the latest version of the object-detection model.” When you release an update, you’ll need a partially manual workflow to get that model into production.
But with a model registry, it’s as simple as adding the correct tag. The other components in your MLOps framework can rely on a model registry as a structured and integrated source of truth.
#2 An up-to-date overview of all models
Model registries include dashboards for you to filter and view models across your organization. You can see what models exist, whether they are running in production, and recently released versions. This birds-eye view is invaluable if you’re managing multiple projects.
#3 Tracking model versions
Your nightmare situation is when you’re not sure which model produced the results you’re viewing. This easily happens if your team changes a model without updating a version number in a centralized model registry. Is it the “model.pickle” file or the “model-new.pickle” file, or something else entirely?! File names usually don’t have enough information to track down the relevant model, and sources like your internal documentation might be out of date or inaccurate.
A model registry lets you track a specific version number that is updated with each change, so makes it easy to see where each version has been used. You’ll know that it was object-detection-model v3.1.2 that created a specific prediction; you’ll know when that model was last changed, what features it was trained on, and any associated change in results.
#4 Tracking model stages
Similar to tracking versions, different models might be used at different stages – in development, staging, or production, for example. Depending on your workflow, you might deploy slightly different models to different environments. It’s important you track these in order to calculate reliable results and audit predictions.
With a model registry, you can tag each model to define the environment it's running in.
#5 Understanding and comparing your models
There is probably context related to your models that doesn’t fit neatly into your existing tools.
Without a model registry, you lack a clear place to record this data. A model registry allows you to associate structured and unstructured metadata with each model.
You can help your team keep track by annotating your models with descriptions of what each model is for, and what each is related to. For example, a simple text note such as “This model was trained based on the [this paper] using our 2018-2020 dataset” can save countless hours for people trying to figure this out months later.
Other metadata, including evaluation metrics, can help you compare your models head-to-head using a built-in dashboard.
#6 Managing dependencies for all models
Often, your models will have different dependencies. You might have built one model on PyTorch and another on Tensorflow, for example. If you track these dependencies in your model registry, you can ensure you deploy your models to the correct environments.
You can also use your model registry to link your models back to the preprocessing code that you used to create them. Where is that script you used for cleaning the data before training the model? What feature engineering steps did you use?
A model registry provides a link between the model file itself and the code you used to create it.
When something goes wrong, following the trail is much harder without a model registry. With a model registry, you can track exactly what went into every prediction your solution generates: which model created the prediction; what version it was on when the prediction was generated; what environment it was running in; and how that model was trained. This data is all vital to an accurate audit trail – whether to help you fix issues or to comply with legal requirements.
It could be that different teams make changes to the same model. A model registry means anyone can see when new versions are released and what changes have been made. Everyone on the team or in the company has access to exactly the same model and the same version.
If two colleagues are confident that they have exactly the same model file, they can work together more easily and be confident they are getting repeatable results. They can also avoid working on conflicting changes, as they can see an overview of different versions of a model and the purpose of each.
If several people want to work on a better version of the same model, they can each use the model registry to see what their colleagues have already tried and see the results of different variations. This gives everyone the chance to improve on existing work instead of repeating work that others have already attempted.
Do you need help keeping track of your models?
We are a machine learning agency that knows the challenges of building machine learning solutions collaboratively. Contact us any time if you’d like to discuss your MLOps architecture and tooling.