30 Days of MLOps Challenge · Day 20

Model registry iconModel Registry – Managing and Versioning ML Models

By Aviraj Kawade · September 16, 2025 · 5 min read

Learn Model Registry to systematically manage, version, and track ML models across their lifecycle, ensuring reproducibility and smooth transitions between development, staging, and production. It enables better collaboration, auditability, and automated model promotion in real-world ML workflows.

💡 Hey — It's Aviraj Kawade 👋

We should learn Model Registry to systematically manage, version, and track ML models across their lifecycle, ensuring reproducibility and smooth transitions between development, staging, and production. It enables better collaboration, auditability, and automated model promotion in real-world ML workflows.

📚 Key Learnings

  • Understand the role of a Model Registry in production ML systems
  • Learn how to track, store, and version models throughout the ML lifecycle
  • Discover how to promote models between stages (e.g., Staging → Production)
  • Use tools like MLflow Model Registry, SageMaker Model Registry, or DVC + Git

🧠 Learn here

Model Registry overview diagram

What is Model Registry?

A Model Registry is a centralized hub for managing the lifecycle of machine learning models. It plays a crucial role in tracking, versioning, and deploying models reliably in production environments. Think of it as the equivalent of a source control system for ML models.

Why Use a Model Registry?

  • Track multiple versions of a model
  • Store metadata (e.g., accuracy, parameters, training dataset)
  • Promote models to staging or production
  • Enable reproducibility and auditing
  • Integrate with CI/CD pipelines for ML

Core Components

ComponentDescription
ModelSerialized version of the ML model (e.g., .pkl, .onnx)
VersionEach model can have multiple versions (v1, v2, etc.)
StageLifecycle stage like Staging, Production, Archived
MetadataInfo such as metrics, parameters, training date, dataset ID
LineageLinks between training code, data, and model version

Popular Model Registries

ToolDescription
MLflowOpen-source, part of Databricks ecosystem
SageMaker Model RegistryAWS-native managed registry
Weights & BiasesCloud-hosted MLOps platform with model tracking
DVCGit-based model and data versioning

Example Workflow

  • Train a model
  • Save the model artifact and metrics
  • Register the model to a registry
  • Tag it as Staging
  • After testing, promote to Production
  • Serve it using an inference engine (e.g., FastAPI, TorchServe)

Example Snippet: Using MLflow Registry

import mlflow

# Log model
with mlflow.start_run():
    mlflow.sklearn.log_model(model, "model")
    mlflow.log_metric("rmse", 0.85)
    mlflow.register_model(
        "runs:/<run_id>/model",
        "ChurnPredictionModel"
    )

# Update model stage
from mlflow.tracking import MlflowClient
client = MlflowClient()
client.transition_model_version_stage(
    name="ChurnPredictionModel",
    version=1,
    stage="Production"
)

💡 Pro Tips

  • Use naming conventions for models (e.g., project_modelname_vX)
  • Automate registry steps via CI/CD pipelines
  • Monitor model usage and performance post-deployment
  • Archive obsolete or underperforming models

Tracking, Storing, and Versioning Models in the ML Lifecycle

ML Model Lifecycle Stages

  1. Data Collection & Preprocessing
  2. Model Training
  3. Evaluation & Validation
  4. Versioning & Storage
  5. Deployment
  6. Monitoring & Retraining

Model Tracking

Tools

  • MLflow Tracking: Logs parameters, metrics, and artifacts
  • Weights & Biases: Tracks experiments, datasets, and outputs
  • Neptune.ai: Focused on research experiments

What to Track

  • Model hyperparameters
  • Dataset and feature versions
  • Training scripts and environment
  • Evaluation metrics
import mlflow
with mlflow.start_run():
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_metric("accuracy", 0.94)
    mlflow.log_artifact("model.pkl")

Model Storage

Formats

  • Pickle (.pkl): Python native format (beware of security risks)
  • Joblib: Efficient for numpy arrays
  • ONNX: Cross-platform model format
  • SavedModel (TensorFlow), TorchScript (PyTorch)

Storage Options

  • Cloud Buckets: S3, GCS, Azure Blob
  • Model Registries: MLflow, SageMaker, WandB
  • Artifact Stores: DVC, Git LFS

Model Versioning

Why Version?

  • Ensure reproducibility
  • Trace which model was used for a decision
  • Compare performance between iterations
  • Enable rollbacks

Methods

ToolVersioning Strategy
MLflowAuto-versioned runs & models
DVCGit-like version control for models/data
SageMakerIncremental model versions in registry
Custom GitOpsUse Git tags/branches for model tracking

Example: DVC for Model Versioning

# Initialize DVC
$ dvc init

# Track model file
$ dvc add models/model.pkl

# Commit to Git
$ git add models/model.pkl.dvc .gitignore
$ git commit -m "Track model v1 with DVC"

# Push to remote
$ dvc remote add -d myremote s3://ml-models-bucket
$ dvc push

🧠 Best Practices

  • Always log experiments (code, metrics, data snapshot)
  • Store models in reproducible formats
  • Automate versioning using pipelines
  • Attach metadata like accuracy, dataset ID, git commit hash

Promoting Models Between Stages (e.g., Staging → Production)

Common Stages in Model Lifecycle

StagePurpose
NoneInitial unassigned stage after registration
StagingReady for testing and validation by QA or pre-prod environment
ProductionApproved and live for inference in production environments
ArchivedDeprecated or superseded models no longer in use

Workflow: MLflow Model Registry Example

Registering and Promoting a Model

from mlflow.tracking import MlflowClient

client = MlflowClient()

# Register a model version
model_uri = "runs:/<run-id>/model"
model_name = "CreditScoringModel"
mlflow.register_model(model_uri, model_name)

# Promote to Staging
client.transition_model_version_stage(
    name=model_name,
    version=1,
    stage="Staging",
)

# Promote to Production after testing
client.transition_model_version_stage(
    name=model_name,
    version=1,
    stage="Production",
    archive_existing_versions=True  # Optional: archive older prod models
)

Governance Considerations

  • Enforce approval gates before production promotion
  • Maintain audit logs for transitions
  • Use RBAC to control who can promote models
  • Define manual vs automated promotion processes

Automation with CI/CD Pipelines

Example CI Step in GitHub Actions
- name: Promote model to Production
  run: |
    python promote_model.py  # Contains MLflow promotion logic

Trigger this step after successful staging tests.

Using MLflow, SageMaker, and DVC+Git for Model Registry

MLflow Model Registry

Register and Promote Model

import mlflow
from mlflow.tracking import MlflowClient

# Start an MLflow run and log model
with mlflow.start_run():
    mlflow.sklearn.log_model(model, "model")
    mlflow.log_metric("accuracy", 0.92)
    result = mlflow.register_model(
        "runs:/<run_id>/model", "CustomerChurnModel"
    )

# Promote model to Production
client = MlflowClient()
client.transition_model_version_stage(
    name="CustomerChurnModel",
    version=1,
    stage="Production",
)

SageMaker Model Registry

Register and Deploy Model

import sagemaker
from sagemaker import get_execution_role
from sagemaker.model import Model

role = get_execution_role()
model = Model(
    image_uri="123456789012.dkr.ecr.us-west-2.amazonaws.com/my-custom-image:latest",
    model_data="s3://my-bucket/model.tar.gz",
    role=role
)

# Register Model
model_package_group_name = "CreditScoringModel"
model_package = model.register(
    content_types=["application/json"],
    response_types=["application/json"],
    model_package_group_name=model_package_group_name,
    approval_status="PendingManualApproval",
)
# Approve to Production manually or via API
sm_client = boto3.client("sagemaker")
sm_client.update_model_package(
    ModelPackageArn=model_package.model_package_arn,
    ModelApprovalStatus="Approved"
)

DVC + Git for Model Versioning

Track and Push Model Artifacts

# Initialize DVC
$ dvc init

# Add model artifact to DVC tracking
$ dvc add models/model.pkl

# Commit and push to Git
$ git add models/model.pkl.dvc .gitignore
$ git commit -m "Track model v1 with DVC"

# Add remote and push model
$ dvc remote add -d myremote s3://my-model-bucket
$ dvc push

🔃 Switch Between Versions

# Checkout older Git commit to retrieve old model version
$ git checkout feature/new-model-v2
$ dvc pull  # Downloads associated model.pkl

🧠 In-Short

FeatureMLflowSageMakerDVC + Git
Model Versioning
Promotion Between Stages🚫 (manual via Git)
CI/CD Integration
Hosted/Cloud NativeOptionalAWS-nativeSelf-hosted/Git-based

Each tool has its strengths:

  • Use MLflow for flexibility and OSS workflows
  • Use SageMaker Registry if you're all-in on AWS
  • Use DVC + Git for lightweight, GitOps-style versioning

Pick based on your infra stack, team maturity, and governance needs.

🔥 Challenges

Model Versioning & Tracking

  • Register at least 2 versions of the same model (e.g., v1, v2) with different metrics
  • Add metadata: model performance, training timestamp, dataset version, and notes
  • Compare versions and identify which should go to production

Stage Transition & Promotion

  • Promote the best model version from "Staging" → "Production"
  • Log the reason for promotion (who approved it, based on what criteria)

Automation Integration

  • Add model registration logic to your training script
  • Update your serving logic to always fetch the latest "Production" model

Advanced Tracking

  • Link each model version with a Git commit hash
  • Implement a rollback script that reverts to the last good model version
  • Integrate with Slack/Email to notify on model stage changes