30 Days of MLOps Challenge · Day 20

Model Registry – Managing and Versioning ML Models

By Aviraj Kawade · September 16, 2025 · 5 min read

Learn Model Registry to systematically manage, version, and track ML models across their lifecycle, ensuring reproducibility and smooth transitions between development, staging, and production. It enables better collaboration, auditability, and automated model promotion in real-world ML workflows.

← Previous: Day 19 Back to Roadmap

💡 Hey — It's Aviraj Kawade 👋

We should learn Model Registry to systematically manage, version, and track ML models across their lifecycle, ensuring reproducibility and smooth transitions between development, staging, and production. It enables better collaboration, auditability, and automated model promotion in real-world ML workflows.

📚 Key Learnings

Understand the role of a Model Registry in production ML systems
Learn how to track, store, and version models throughout the ML lifecycle
Discover how to promote models between stages (e.g., Staging → Production)
Use tools like MLflow Model Registry, SageMaker Model Registry, or DVC + Git

🧠 Learn here

What is Model Registry?

A Model Registry is a centralized hub for managing the lifecycle of machine learning models. It plays a crucial role in tracking, versioning, and deploying models reliably in production environments. Think of it as the equivalent of a source control system for ML models.

Why Use a Model Registry?

Track multiple versions of a model
Store metadata (e.g., accuracy, parameters, training dataset)
Promote models to staging or production
Enable reproducibility and auditing
Integrate with CI/CD pipelines for ML

Core Components

Component	Description
Model	Serialized version of the ML model (e.g., .pkl, .onnx)
Version	Each model can have multiple versions (v1, v2, etc.)
Stage	Lifecycle stage like Staging, Production, Archived
Metadata	Info such as metrics, parameters, training date, dataset ID
Lineage	Links between training code, data, and model version

Popular Model Registries

Tool	Description
MLflow	Open-source, part of Databricks ecosystem
SageMaker Model Registry	AWS-native managed registry
Weights & Biases	Cloud-hosted MLOps platform with model tracking
DVC	Git-based model and data versioning

Example Workflow

Train a model
Save the model artifact and metrics
Register the model to a registry
Tag it as Staging
After testing, promote to Production
Serve it using an inference engine (e.g., FastAPI, TorchServe)

Example Snippet: Using MLflow Registry

import mlflow

# Log model
with mlflow.start_run():
    mlflow.sklearn.log_model(model, "model")
    mlflow.log_metric("rmse", 0.85)
    mlflow.register_model(
        "runs:/<run_id>/model",
        "ChurnPredictionModel"
    )

# Update model stage
from mlflow.tracking import MlflowClient
client = MlflowClient()
client.transition_model_version_stage(
    name="ChurnPredictionModel",
    version=1,
    stage="Production"
)

💡 Pro Tips

Use naming conventions for models (e.g., project_modelname_vX)
Automate registry steps via CI/CD pipelines
Monitor model usage and performance post-deployment
Archive obsolete or underperforming models

Tracking, Storing, and Versioning Models in the ML Lifecycle

ML Model Lifecycle Stages

Data Collection & Preprocessing
Model Training
Evaluation & Validation
Versioning & Storage
Deployment
Monitoring & Retraining

Model Tracking

Tools

MLflow Tracking: Logs parameters, metrics, and artifacts
Weights & Biases: Tracks experiments, datasets, and outputs
Neptune.ai: Focused on research experiments

What to Track

Model hyperparameters
Dataset and feature versions
Training scripts and environment
Evaluation metrics

import mlflow
with mlflow.start_run():
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_metric("accuracy", 0.94)
    mlflow.log_artifact("model.pkl")

Model Storage

Formats

Pickle (.pkl): Python native format (beware of security risks)
Joblib: Efficient for numpy arrays
ONNX: Cross-platform model format
SavedModel (TensorFlow), TorchScript (PyTorch)

Storage Options

Cloud Buckets: S3, GCS, Azure Blob
Model Registries: MLflow, SageMaker, WandB
Artifact Stores: DVC, Git LFS

Model Versioning

Why Version?

Ensure reproducibility
Trace which model was used for a decision
Compare performance between iterations
Enable rollbacks

Methods

Tool	Versioning Strategy
MLflow	Auto-versioned runs & models
DVC	Git-like version control for models/data
SageMaker	Incremental model versions in registry
Custom GitOps	Use Git tags/branches for model tracking

Example: DVC for Model Versioning

# Initialize DVC
$ dvc init

# Track model file
$ dvc add models/model.pkl

# Commit to Git
$ git add models/model.pkl.dvc .gitignore
$ git commit -m "Track model v1 with DVC"

# Push to remote
$ dvc remote add -d myremote s3://ml-models-bucket
$ dvc push

🧠 Best Practices

Always log experiments (code, metrics, data snapshot)
Store models in reproducible formats
Automate versioning using pipelines
Attach metadata like accuracy, dataset ID, git commit hash

Promoting Models Between Stages (e.g., Staging → Production)

Common Stages in Model Lifecycle

Stage	Purpose
None	Initial unassigned stage after registration
Staging	Ready for testing and validation by QA or pre-prod environment
Production	Approved and live for inference in production environments
Archived	Deprecated or superseded models no longer in use

Workflow: MLflow Model Registry Example

Registering and Promoting a Model

from mlflow.tracking import MlflowClient

client = MlflowClient()

# Register a model version
model_uri = "runs:/<run-id>/model"
model_name = "CreditScoringModel"
mlflow.register_model(model_uri, model_name)

# Promote to Staging
client.transition_model_version_stage(
    name=model_name,
    version=1,
    stage="Staging",
)

# Promote to Production after testing
client.transition_model_version_stage(
    name=model_name,
    version=1,
    stage="Production",
    archive_existing_versions=True  # Optional: archive older prod models
)

Governance Considerations

Enforce approval gates before production promotion
Maintain audit logs for transitions
Use RBAC to control who can promote models
Define manual vs automated promotion processes

Automation with CI/CD Pipelines

Example CI Step in GitHub Actions

- name: Promote model to Production
  run: |
    python promote_model.py  # Contains MLflow promotion logic

Trigger this step after successful staging tests.

Using MLflow, SageMaker, and DVC+Git for Model Registry

MLflow Model Registry

Register and Promote Model

import mlflow
from mlflow.tracking import MlflowClient

# Start an MLflow run and log model
with mlflow.start_run():
    mlflow.sklearn.log_model(model, "model")
    mlflow.log_metric("accuracy", 0.92)
    result = mlflow.register_model(
        "runs:/<run_id>/model", "CustomerChurnModel"
    )

# Promote model to Production
client = MlflowClient()
client.transition_model_version_stage(
    name="CustomerChurnModel",
    version=1,
    stage="Production",
)

SageMaker Model Registry

Register and Deploy Model

import sagemaker
from sagemaker import get_execution_role
from sagemaker.model import Model

role = get_execution_role()
model = Model(
    image_uri="123456789012.dkr.ecr.us-west-2.amazonaws.com/my-custom-image:latest",
    model_data="s3://my-bucket/model.tar.gz",
    role=role
)

# Register Model
model_package_group_name = "CreditScoringModel"
model_package = model.register(
    content_types=["application/json"],
    response_types=["application/json"],
    model_package_group_name=model_package_group_name,
    approval_status="PendingManualApproval",
)

# Approve to Production manually or via API
sm_client = boto3.client("sagemaker")
sm_client.update_model_package(
    ModelPackageArn=model_package.model_package_arn,
    ModelApprovalStatus="Approved"
)

DVC + Git for Model Versioning

Track and Push Model Artifacts

# Initialize DVC
$ dvc init

# Add model artifact to DVC tracking
$ dvc add models/model.pkl

# Commit and push to Git
$ git add models/model.pkl.dvc .gitignore
$ git commit -m "Track model v1 with DVC"

# Add remote and push model
$ dvc remote add -d myremote s3://my-model-bucket
$ dvc push

🔃 Switch Between Versions

# Checkout older Git commit to retrieve old model version
$ git checkout feature/new-model-v2
$ dvc pull  # Downloads associated model.pkl

🧠 In-Short

Feature	MLflow	SageMaker	DVC + Git
Model Versioning	✅	✅	✅
Promotion Between Stages	✅	✅	🚫 (manual via Git)
CI/CD Integration	✅	✅	✅
Hosted/Cloud Native	Optional	AWS-native	Self-hosted/Git-based

Each tool has its strengths:

Use MLflow for flexibility and OSS workflows
Use SageMaker Registry if you're all-in on AWS
Use DVC + Git for lightweight, GitOps-style versioning

Pick based on your infra stack, team maturity, and governance needs.

🔥 Challenges

Model Versioning & Tracking

Register at least 2 versions of the same model (e.g., v1, v2) with different metrics
Add metadata: model performance, training timestamp, dataset version, and notes
Compare versions and identify which should go to production

Stage Transition & Promotion

Promote the best model version from "Staging" → "Production"
Log the reason for promotion (who approved it, based on what criteria)

Automation Integration

Add model registration logic to your training script
Update your serving logic to always fetch the latest "Production" model

Advanced Tracking

Link each model version with a Git commit hash
Implement a rollback script that reverts to the last good model version
Integrate with Slack/Email to notify on model stage changes

← Previous: Day 19 ← Back to MLOps Roadmap Next: Day 21 →