Developer Tools

Conda Environment Management: The Complete Guide for Data Scientists

Master conda environments, package management, channels, and reproducible workflows. From basic commands to advanced tips for Python and R projects.

9 min read

Data science workspace with code on screen

Conda is the package and environment manager that powers modern data science. Whether you're juggling Python versions, installing GPU-accelerated libraries, or sharing reproducible environments with your team, conda handles it all — without the dependency nightmares that plague pip-only workflows.

Why conda over pip + venv?

Pip installs Python packages. Conda installs anything — Python, R, C libraries, CUDA toolkits, and system-level dependencies. This matters because data science libraries like NumPy, SciPy, and PyTorch depend on compiled C/Fortran code that pip can't always resolve correctly.

Feature pip + venv conda
Python packages
Non-Python deps (C, CUDA)
Python version management ❌ (need pyenv)
Environment isolation
Dependency solver Basic SAT solver
Cross-platform binaries Wheels only Full support

Getting started: Miniconda vs Anaconda

Miniconda gives you conda + Python in ~80 MB. You install only what you need. Anaconda ships 250+ pre-installed packages (~3 GB) — convenient but bloated for most workflows.

For most developers, Miniconda is the better choice:

# Install Miniconda (Linux/macOS)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

# Verify installation
conda --version

Environment management essentials

Creating environments

Always create a dedicated environment for each project. Never install packages into base:

# Create with specific Python version
conda create -n myproject python=3.11

# Create with packages pre-installed
conda create -n ml-project python=3.11 numpy pandas scikit-learn

# Create from an environment file
conda env create -f environment.yml

Activating and switching

conda activate myproject    # Switch to environment
conda deactivate            # Return to base
conda env list              # See all environments

Installing packages

# From default channel
conda install numpy pandas matplotlib

# From conda-forge (community packages)
conda install -c conda-forge polars duckdb

# Specific version
conda install pytorch=2.2 -c pytorch

# Install pip packages when conda doesn't have them
pip install some-niche-package

Pro tip: Always try conda install first. Fall back to pip install only when a package isn't available on any conda channel. Mixing the two can cause dependency conflicts if not managed carefully.

Removing and cleaning

# Remove a package
conda remove scipy

# Delete an entire environment
conda env remove -n old-project

# Clean cached packages (free disk space)
conda clean --all

Channels and priorities

Channels are where conda downloads packages from. The order matters:

# Add conda-forge as highest priority
conda config --add channels conda-forge
conda config --set channel_priority strict

# View current channels
conda config --show channels

Recommended setup for data science:

channels:
  - conda-forge
  - defaults

Setting channel_priority: strict means conda will always prefer conda-forge, avoiding mixed-channel dependency issues.

Reproducible environments with environment.yml

The environment.yml file is how you share environments with teammates:

name: ml-pipeline
channels:
  - conda-forge
  - pytorch
dependencies:
  - python=3.11
  - numpy=1.26
  - pandas=2.2
  - scikit-learn=1.4
  - pytorch=2.2
  - jupyter
  - pip:
    - wandb
    - some-pip-only-package

Export and recreate

# Export current environment (cross-platform)
conda env export --from-history > environment.yml

# Export with exact versions (same OS only)
conda env export > environment-lock.yml

# Recreate on another machine
conda env create -f environment.yml

The --from-history flag exports only packages you explicitly installed, making the file portable across operating systems.

Advanced tips for power users

Speed up conda with libmamba

The default conda solver can be slow. Switch to libmamba for 10-50x faster resolves:

conda install -n base conda-libmamba-solver
conda config --set solver libmamba

As of conda 23.10+, libmamba is the default solver — but check if you're on an older version.

Stacking environments

You can stack environments for shared base dependencies:

conda activate base-ml
conda activate --stack experiment-42

This gives you access to packages from both environments without duplicating large installs.

Using conda in Docker

FROM continuumio/miniconda3:latest

COPY environment.yml .
RUN conda env create -f environment.yml && conda clean -afy

SHELL ["conda", "run", "-n", "myenv", "/bin/bash", "-c"]
CMD ["python", "app.py"]

Using conda in CI/CD

# GitHub Actions example
- uses: conda-incubator/setup-miniconda@v3
  with:
    environment-file: environment.yml
    python-version: "3.11"
    activate-environment: myproject

Common pitfalls and solutions

1. "Solving environment" takes forever

Switch to the libmamba solver (see above) or pin fewer package versions.

2. Mixing conda and pip breaks things

Install all conda packages first, then pip packages. If you need to add more conda packages later, recreate the environment from environment.yml.

3. Environment is too large

Use conda clean --all regularly. Consider using Miniconda instead of Anaconda, and install only what you need.

4. "Package not found" errors

Search for the package: conda search package-name. Try conda-forge: conda install -c conda-forge package-name. If it's Python-only, use pip as a fallback.

Quick reference

Need a quick lookup for any conda command? Use our Conda Cheat Sheet — it has 90+ commands organized by category with one-click copy, search, and filtering.

Wrapping up

Conda environments are essential for reproducible data science. The key practices are:

  1. One environment per project — never pollute base
  2. Use environment.yml — version-control your dependencies
  3. Prefer conda-forge — largest community channel
  4. Export with --from-history — portable across platforms
  5. Use libmamba solver — dramatically faster installs

Master these patterns and you'll never hear "but it works on my machine" again.