Conda Environment Management: The Complete Guide for Data Scientists
Master conda environments, package management, channels, and reproducible workflows. From basic commands to advanced tips for Python and R projects.
Conda is the package and environment manager that powers modern data science. Whether you're juggling Python versions, installing GPU-accelerated libraries, or sharing reproducible environments with your team, conda handles it all — without the dependency nightmares that plague pip-only workflows.
Why conda over pip + venv?
Pip installs Python packages. Conda installs anything — Python, R, C libraries, CUDA toolkits, and system-level dependencies. This matters because data science libraries like NumPy, SciPy, and PyTorch depend on compiled C/Fortran code that pip can't always resolve correctly.
| Feature | pip + venv | conda |
|---|---|---|
| Python packages | ✅ | ✅ |
| Non-Python deps (C, CUDA) | ❌ | ✅ |
| Python version management | ❌ (need pyenv) | ✅ |
| Environment isolation | ✅ | ✅ |
| Dependency solver | Basic | SAT solver |
| Cross-platform binaries | Wheels only | Full support |
Getting started: Miniconda vs Anaconda
Miniconda gives you conda + Python in ~80 MB. You install only what you need. Anaconda ships 250+ pre-installed packages (~3 GB) — convenient but bloated for most workflows.
For most developers, Miniconda is the better choice:
# Install Miniconda (Linux/macOS)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# Verify installation
conda --version
Environment management essentials
Creating environments
Always create a dedicated environment for each project. Never install packages into base:
# Create with specific Python version
conda create -n myproject python=3.11
# Create with packages pre-installed
conda create -n ml-project python=3.11 numpy pandas scikit-learn
# Create from an environment file
conda env create -f environment.yml
Activating and switching
conda activate myproject # Switch to environment
conda deactivate # Return to base
conda env list # See all environments
Installing packages
# From default channel
conda install numpy pandas matplotlib
# From conda-forge (community packages)
conda install -c conda-forge polars duckdb
# Specific version
conda install pytorch=2.2 -c pytorch
# Install pip packages when conda doesn't have them
pip install some-niche-package
Pro tip: Always try conda install first. Fall back to pip install only when a package isn't available on any conda channel. Mixing the two can cause dependency conflicts if not managed carefully.
Removing and cleaning
# Remove a package
conda remove scipy
# Delete an entire environment
conda env remove -n old-project
# Clean cached packages (free disk space)
conda clean --all
Channels and priorities
Channels are where conda downloads packages from. The order matters:
# Add conda-forge as highest priority
conda config --add channels conda-forge
conda config --set channel_priority strict
# View current channels
conda config --show channels
Recommended setup for data science:
channels:
- conda-forge
- defaults
Setting channel_priority: strict means conda will always prefer conda-forge, avoiding mixed-channel dependency issues.
Reproducible environments with environment.yml
The environment.yml file is how you share environments with teammates:
name: ml-pipeline
channels:
- conda-forge
- pytorch
dependencies:
- python=3.11
- numpy=1.26
- pandas=2.2
- scikit-learn=1.4
- pytorch=2.2
- jupyter
- pip:
- wandb
- some-pip-only-package
Export and recreate
# Export current environment (cross-platform)
conda env export --from-history > environment.yml
# Export with exact versions (same OS only)
conda env export > environment-lock.yml
# Recreate on another machine
conda env create -f environment.yml
The --from-history flag exports only packages you explicitly installed, making the file portable across operating systems.
Advanced tips for power users
Speed up conda with libmamba
The default conda solver can be slow. Switch to libmamba for 10-50x faster resolves:
conda install -n base conda-libmamba-solver
conda config --set solver libmamba
As of conda 23.10+, libmamba is the default solver — but check if you're on an older version.
Stacking environments
You can stack environments for shared base dependencies:
conda activate base-ml
conda activate --stack experiment-42
This gives you access to packages from both environments without duplicating large installs.
Using conda in Docker
FROM continuumio/miniconda3:latest
COPY environment.yml .
RUN conda env create -f environment.yml && conda clean -afy
SHELL ["conda", "run", "-n", "myenv", "/bin/bash", "-c"]
CMD ["python", "app.py"]
Using conda in CI/CD
# GitHub Actions example
- uses: conda-incubator/setup-miniconda@v3
with:
environment-file: environment.yml
python-version: "3.11"
activate-environment: myproject
Common pitfalls and solutions
1. "Solving environment" takes forever
Switch to the libmamba solver (see above) or pin fewer package versions.
2. Mixing conda and pip breaks things
Install all conda packages first, then pip packages. If you need to add more conda packages later, recreate the environment from environment.yml.
3. Environment is too large
Use conda clean --all regularly. Consider using Miniconda instead of Anaconda, and install only what you need.
4. "Package not found" errors
Search for the package: conda search package-name. Try conda-forge: conda install -c conda-forge package-name. If it's Python-only, use pip as a fallback.
Quick reference
Need a quick lookup for any conda command? Use our Conda Cheat Sheet — it has 90+ commands organized by category with one-click copy, search, and filtering.
Wrapping up
Conda environments are essential for reproducible data science. The key practices are:
- One environment per project — never pollute
base - Use environment.yml — version-control your dependencies
- Prefer conda-forge — largest community channel
- Export with
--from-history— portable across platforms - Use libmamba solver — dramatically faster installs
Master these patterns and you'll never hear "but it works on my machine" again.