30 Days of MLOps Challenge · Day 4
Reproducible ML environments using Conda & Docker
Use Conda for package‑level reproducibility and Docker for system‑level consistency to eliminate “works on my machine” problems.
💡 Hey — It's Aviraj Kawade 👋
🧠 New to DevOps? Start 60 Days of DevOps
Key Learnings
- Why environment reproducibility matters in ML.
- Conda for managing Python environments and dependencies.
- Docker for portable, consistent runtime environments.
- How Conda and Docker complement each other.
Environment Reproducibility in ML
Recreate the exact same environment—software versions, dependencies, and system libraries—so models behave identically across machines and time.
- Consistent results across train/test/prod
- Reliable experimentation and comparisons
- Team collaboration without setup issues
- Fewer "works on my machine" bugs
- Streamlined CI/CD and debugging
Common tools: Conda/Virtualenv, Docker, pip + requirements.txt, MLflow/DVC.
Conda for Managing Python Environments
Conda isolates per‑project dependencies, supports non‑Python packages and CUDA stacks, and enables easy export/import of entire environments.
Install Conda
Linux/macOS (x86)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# macOS ARM
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
Windows (PowerShell)
wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe" -outfile ".\Downloads\Miniconda3-latest-Windows-x86_64.exe"
Core workflow
# Create env
conda create -n ml-env python=3.10 numpy pandas scikit-learn jupyter
# Activate
conda activate ml-env
# Install deps
conda install matplotlib seaborn jupyterlab
conda install -c conda-forge xgboost
# Deep learning
conda install -c pytorch pytorch torchvision torchaudio
conda install -c conda-forge tensorflow
# Jupyter kernel
pip install ipykernel
python -m ipykernel install --user --name ml-env --display-name "Python (ml-env)"
# Export / Re-create
envname=ml-env
conda env export -n %envname% > environment.yml
conda env create -f environment.yml
# Remove
conda remove -n ml-env --all
Example Project Structure
my-ml-project/
├── data/
├── notebooks/
├── src/
├── environment.yml
└── README.md
Best Practices
- Commit
environment.yml
to Git. - Prefer conda-forge channel for ML stacks.
- Use conda-lock or Docker for tighter reproducibility.
- Install pip packages last if needed.
Docker for Portable ML Environments
Why Docker?
- Portability: package code + environment as a single image.
- Consistency: immutable builds prevent drift.
- Reproducibility: identical across dev/test/prod.
1) Dockerfile
# Base image with Python
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# System deps
RUN apt-get update && apt-get install -y \
build-essential \
git \
wget \
&& rm -rf /var/lib/apt/lists/*
# Copy project
COPY . .
# Python deps
RUN pip install --no-cache-dir -r requirements.txt
# Default command
CMD ["python", "train.py"]
2) requirements.txt
numpy
pandas
scikit-learn
matplotlib
jupyterlab
tensorflow
3) Build & Run
# Build
docker build -t ml-env:latest .
# Interactive dev
docker run -it --rm -v $(pwd):/app ml-env:latest
# Jupyter Lab
docker run -it -p 8888:8888 -v $(pwd):/app ml-env:latest jupyter lab --ip=0.0.0.0 --allow-root
4) .dockerignore
__pycache__/
*.pyc
.env
data/
models/
5) docker-compose (optional)
version: '3'
services:
ml:
build: .
volumes:
- .:/app
ports:
- "8888:8888"
mongo:
image: mongo:latest
ports:
- "27017:27017"
Pro Tips
- Pin versions in requirements.txt
- Use lightweight base images
- Mount data via volumes; keep images lean
- Use env vars for secrets/config
Conda vs Docker
Feature | Conda | Docker |
---|---|---|
Scope | Python/R envs and packages | Full OS‑level env and deps |
Speed | Fast local setup | Slower to build images |
Isolation | Language/package level | System‑level isolation |
Portability | OS‑dependent quirks | Highly portable |
Best use | Notebooks & prototyping | Deployment & CI/CD |
Docker + Conda Together
Combine Conda for dependency management with Docker for system reproducibility.
Dockerfile
FROM continuumio/miniconda3
# Copy and create Conda environment
COPY environment.yml .
RUN conda env create -f environment.yml
# Activate the environment for subsequent RUN/CMD
SHELL ["conda", "run", "-n", "mlenv", "/bin/bash", "-c"]
# Set working directory
WORKDIR /app
COPY . .
CMD ["python", "train.py"]
environment.yml
name: mlenv
channels:
- defaults
- conda-forge
dependencies:
- python=3.9
- pandas
- numpy
- scikit-learn
Tips
- Version Dockerfile and environment.yml in Git.
- Use docker-compose for multi‑service setups.
- Prefer mambaforge for faster installs.
Challenges
- Create a Conda environment, install 3 ML packages, and export it as
environment.yml
. - Install a Jupyter kernel for your Conda environment and test in JupyterLab.
- Write a Dockerfile to build a basic ML image with Pandas and Scikit‑learn.
- Run the Docker container and verify dependencies inside.
- Combine Conda + Docker: build an image using your exported
environment.yml
. - Document your setup in
README.md
so others can reproduce.