**NOTE: This repository is not being maintained and new environments are being created the containerised repository.
Containerised environments are able to use Conda environments in addition other package management systems and offer a number of advantages over locally run Conda Environments, these include:
- Isolation: Containers provide isolation between applications, preventing dependency conflicts.
- Scalability: Containers are lightweight and can be scaled up or down to meet demands, providing flexibility in resource allocation.
- Dependency Management: Dependencies are encapsulated within the container, reducing conflicts.
- Environment Consistency: Ensures the same environment is used in development, testing, and production.
- Reproducibility and Portability: Containers can be easily transferred between machines or environments, making them portable. **
This repository contains conda environment configuration files for Data Engineering, Data Science (including Machine Learning), and related projects. The goal is to establish standardised environments that can be easily shared across multiple projects, reducing the number of virtual environments on each computer and facilitating collaboration among team members.
New environments are created when packages are added, removed, or upgraded. Once an environment specification is defined and published in a YAML file, we consider it immutable. Any changes require creating a new environment to ensure environments are reproducible and maintainable.
The naming convention for environments follows the format E (for Environment) followed by a three-digit number in sequence, continuing from the most recent environment. For example, E001 is followed by E002.
If you are part of a team Select an environment that suits your needs,
The answer depends on what you want to achieve and what packages you need so below are some examples and ideas of where to start, most environments contain Jupyter, pandas and numpy
Data collection and handling, this environment contains various libraries to obtain data from a range of sources (make sure to check terms of the 3rd party sites before using)
If you are setting up the environment in a Docker container you might find these Docker install instructions helpful.
The yml folder contains anaconda environment yml files and a read me that describes at a high level their contents.
Once created and published we consider environments to be immutable. When making any alterations or additions please submit the new environment via a pull request with the new environment numbered sequentially from the highest number environment below.
[E041] provides pytorch
Each environment is created sequentially numbered to allow for versioning and easy tracking.
e.g. Environment 001 which gets called E001
Most of the environments are created by importing a previous environment and updating and adding additional packages.
The YML files are named as per the environment name followed by the operating system (eg windows) or "generic" if OS specific packages have been removed and the packages should work on both windows and Linux.
conda create --name [env]
conda env create -f [filename].yml
conda create --name <new_env> --clone <existing_env>
conda update --all
conda env export -n [venv] > [filename].yml
conda env remove -n [venv]
conda info --envs
python -m ipykernel install --user
https://conda.io/docs/user-guide/tasks/manage-environments.html
conda config --add channels [channel]
conda install mamba -n base -c conda-forge mamba install [package_name] -c conda-forge
=[A-Z]+.*$
Removing the hashes from a yml file aids the imports into Linux where the compiled hash values are different.
- msys2-conda-epoch
- m2w64* including:
- m2w64-gcc-libgfortran
- m2w64-libwinpthread-git
- m2w64-gcc-libs
- m2w64-libwinpthread-git
- m2w64-gmp
- m2w64-gcc-libs-core
- vc
- wincertstore
- winpty
- win_inet_pton
- pyreadline
- pywinpty
- icc_rt - Intel(R) C++ | Visual Fortran Compiler for Windows
- torchvision 0.2.2 does not support pillow 7+ due to removal of PILLOW_VERSION. See Github Issue
MinkowskiEngine pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps --install-option="--blas_include_dirs=${CONDA_PREFIX}/include" --install-option="--blas=openblas"
Environment designed to support machine learning research, including data exploration, pre-processing, model development, training, and deployment. This environment includes libraries such as PyTorch, MLflow (for experiment tracking and reproducibility), NumPy, Pandas, Dask, and Matplotlib, as well as tools for visualisation (Matplotlib) among others
Channels:
- pytorch
- nvidia
- huggingface
- conda-forge
- anaconda
- defaults
To provides:
- python
- pytorch 2.3.0
- mlflow
- numpy
- pillow
- pandas
- dask
- pyarrow
- fastparquet
- pandas-profiling
- xlrd
- sqlite
- matplotlib
- jupyterlab
- jupyter_contrib_nbextensions
- ipywidgets
- widgetsnbextension
- xlrd
- sqlite
- matplotlib
- nodejs
- graphviz
- accelerate
- kornia
- wandb
- matplotlib
- tqdm
- webdataset
- wandb
- munch
- onnxruntime
- einops
- onnx2torch
Copy of E041 with updated packages and HuggingFace transformers added used with gpt-neo. To provides:
- pytorch=1.11
- python=3.9
- cuda=11.5
- cudnn=8.3.2
- pip:
- tokenizers==0.13
- transformers==4.23
Created from scratch using channels:
- conda-forge
- anaconda
To provide :
- mamba
- python
- openjdk
- ipykernel
- newspaper3k
- numpy
- alpha_vantage
- yfinance
- pandas-datareader
- pandas
- jupyterlab
- matplotlib
- seaborn
- fastparquet
- pandas-profiling
- graphviz
- dask
- nodejs
- sqlite
- plotly
- quandl
- scipy
- xlrd
- h5py
- scikit-image
- scikit-learn
- pillow requests
- youtube-dl
- mlflow
- pyarrow
- beautifulsoup4
- indexed_gzip
- urllib3
- pytrends
- pyautogui
- black
- pyspark=3.2
Created from scratch using channels:
- pytorch
- conda-forge
- anaconda
To provide :
- pytorch
- torchaudio
- torchvision
- python
- opencv
- mlflow
- ax-platform
- botorch
- gpytorch
- pillow
- tensorboard
- tensorboardx
- databricks-cli
- pycocotools
- indexed_gzip
- numpy
- pandas
- dask
- pyarrow
- fastparquet
- h5py
- pandas-profiling
- xlrd
- sqlite
- matplotlib
- jupyterlab
- scipy
- scikit-learn
- scikit-image
- nodejs
- graphviz
- seaborn
- jupyter_contrib_nbextensions
- ipywidgets
- widgetsnbextension
- openblas-devel
Created from scratch using channels:
- pytorch
- conda-forge
- anaconda
- acellera
To provide :
- pytorch=1.10
- torchaudio
- torchvision=0.11
- python=3.9
- mlflow=1.20
- ax-platform
- botorch
- gpytorch
- pillow
- tensorboard
- tensorboardx
- databricks-cli
- pycocotools
- indexed_gzip
- numpy
- pandas
- dask
- pyarrow
- fastparquet
- h5py
- pandas-profiling
- xlrd
- sqlite
- matplotlib
- jupyterlab
- scipy
- scikit-learn
- scikit-image
- nodejs
- graphviz
- seaborn
- jupyter_contrib_nbextensions
- ipywidgets
- widgetsnbextension
Created from scratch using channels:
- pytorch
- conda-forge
- anaconda
To provide :
- pytorch=1.10
- torchaudio
- torchvision=0.11
- python=3.9
- mlflow=1.20
- pillow
- tensorboard
- tensorboardx
- databricks-cli
- pycocotools
- indexed_gzip
- numpy pandas
- dask pyarrow
- fastparquet
- h5py
- pandas-profiling
- xlrd
- sqlite
- matplotlib
- jupyterlab
- scipy
- scikit-learn
- scikit-image
- nodejs
- graphviz
- seaborn
- jupyter_contrib_nbextensions
- ipywidgets
- widgetsnbextension
Created from scratch to provide fastai 2.4 and fastbook Using channels:
- pytorch
- conda-forge
- anaconda
- mamba
- python=3.9
- mlflow=1.18
- pillow
- tensorboardx
- databricks-cli
- indexed_gzip
- dask
- numpy
- pandas
- matplotlib
- jupyterlab
- scikit-learn
- sqlite
- xlrd
- plotly
- nodejs
- graphviz
- seaborn
Created from scrach to provide pytorch 1.8.1 Using channels:
- pytorch
- conda-forge
- anaconda
Packages:
- mamba
- python=3.9
- pytorch=1.8.1
- torchvision=0.9.1
- mlflow=1.16
- pillow
- tensorboard
- tensorboardx
- pillow
- databricks-cli
- pycocotools
- indexed_gzip
- dask
- pillow
- pandas-profiling
- pyarrow
- numpy
- pandas
- matplotlib
- jupyterlab
- scikit-learn
- scikit-image
- sqlite
- xlrd
- plotly
- nodejs
- graphviz
- fastparquet
- seaborn
- dask
- pillow
- scipy
- h5py
- matplotlib
- jupyter_contrib_nbextensions
Created from scrach to create QR Codes conda-forge:
- segno
- qrcode-artistic
- pillow
- jupyterlab
- nodejs
Created from scrach as data handler:
- mamba (conda-forge)
- newspaper3k (conda-forge)
- pyautogui (conda-forge)
- numpy (conda-forge)
- pandas (conda-forge)
- jupyterlab (conda-forge)
- matplotlib (conda-forge)
- h5py (conda-forge)
- scikit-image (conda-forge)
- scikit-learn (conda-forge)
- pillow (conda-forge)
- requests (conda-forge)
- youtube-dl (conda-forge)
- mlflow=1.14 (conda-forge)
- pyarrow (conda-forge)
- beautifulsoup4 (conda-forge)
- indexed_gzip (conda-forge)
- xlrd (conda-forge)
- quandl (conda-forge)
- urllib3 (conda-forge)
- scipy (conda-forge)
- pytrends (conda-forge)
- plotly (conda-forge)
- sqlite (conda-forge)
- databricks-cli (conda-forge)
- nodejs (conda-forge)
- pandas-profiling (conda-forge)
- graphviz (anaconda)
- dask (anaconda)
- fastparquet (anaconda)
- seaborn
Created from scrach using mamba top get TF2.4.1 for GPU.
- mamba
- tensorflow-gpu=2.4 (anaconda) (available for linux only as of 2021-03-10)
- tensorboard=2.4.1 (conda-forge)
- mlflow=1.14 (conda-forge)
- pyarrow (conda-forge)
- mlflow=1.14 (conda-forge)
- pyarrow (conda-forge)
- indexed_gzip (conda-forge)
- xlrd (conda-forge)
- scipy (conda-forge)
- plotly (conda-forge)
- sqlite (conda-forge)
- databricks-cli (conda-forge)
- nodejs (anaconda)
- numpy=1.19.5 (anaconda) (version needed by TF)
- pandas (anaconda)
- jupyterlab (anaconda)
- matplotlib (anaconda)
- h5py (anaconda)
- scikit-image (anaconda)
- scikit-learn (anaconda)
- pillow (anaconda)
New env from scratch to provide data wrangling and collection tools
- graphviz (anaconda)
- dask (anaconda)
- pandas-profiling (anaconda)
- fastparquet (anaconda)
- seaborn (anaconda)
- conda-forge (conda-forge)
- nodejs (conda-forge)
- newspaper3k (conda-forge)
- numpy (conda-forge)
- pandas (conda-forge)
- jupyterlab (conda-forge)
- matplotlib (conda-forge)
- h5py (conda-forge)
- scikit-image (conda-forge)
- scikit-learn (conda-forge)
- pillow (conda-forge)
- requests (conda-forge)
- youtube-dl (conda-forge)
- mlflow=1.13 (conda-forge)
- pyarrow (conda-forge)
- beautifulsoup4 (conda-forge)
- indexed_gzip (conda-forge)
- xlrd (conda-forge)
- quandl (conda-forge)
- urllib3 (conda-forge)
- scipy (conda-forge)
- pytrends (conda-forge)
- quandl (conda-forge)
- plotly (conda-forge)
- sqlite (conda-forge)
- databricks-cli (conda-forge)
- alpha-vantage (pip)
- tinysegmenter (pip)
Built from scratch (similar to E028) to provide:
- pytorch=1.7
- torchvision=0.8
- pytorch-lightning
- mlflow=1.13
- pillow=6.2.1
- pandas-profiling
- dask
- pyarrow
- numpy
- pandas
- matplotlib
- jupyterlab
- tensorboardx
- scikit-learn
- scikit-image
- scipy
- h5py
- sqlite
- databricks-cli
- pycocotools (as of 2021-01-15 only available for linux, can be used for computing the evaluation IOU metrics)
Built from scratch for use with labelme
Built from scratch to provide:
- tensorflow-gpu=2.3
- tensorboard=2.3
- tensorflow-gpu=2.3 (as of 2021-01-07 tensorflow-gpu=2.3 only available for windows)
- keras=2.4.3
- tensorboard
- scikit-image
- scikit-learn
- scipy jupyterlab
- h5py=2.10
- dask=2.3
- pillow=8
- pandas-profiling
- pyarrow
- numpy
- pandas
- mlflow=1.13
- Databricks-cli=0.9
- matplotlib
Clone of E026 with xlrd added
Built from scratch to provide:
- pytorch=1.7
- torchvision=0.8
- pytorch-lightning
- mlflow=1.12
- pillow=6.2.1
- pandas-profiling
- dask
- pyarrow
- numpy
- pandas
- matplotlib
- jupyterlab
- tensorboardx
- scikit-learn
- scikit-image
- scipy
- h5py
- sqlite
- databricks-cli
Intended for use of fast AI and MLFlow together
- fastai=1.0
- pytorch
- mlflow=1.12
- pillow=8
- pandas-profiling
- dask
- pyarrow
- numpy
- pandas
- matplotlib
- jupyterlab
- tensorboardx
- scikit-learn
- scikit-image
- scipy
- h5py
- databricks-cli
- sqlite
Built from scratch as data handling env to work with Delta Lake and Apache Spark (Pyspark)
- Pyarrow
- Pandas
- Numpy
- Jupyterlab
- Matplotlib
- H5py
- Scikit-image
- Scikit-learn
- MLFlow=1.12
- Pillow=8.0
- youtube-dl
- pandas-profiling
- dask
- beautifulsoup4
- indexed_gzip
- urllib3
- Pyarrow
- pytrends
- alpha-vantage
- quandl
- plotly
- scipy
- scikit-image
- scikit-learn
- seaborn
- newspaper3k
Built from scratch to provide:
- Tensorflow CPU=2.3 (tensorflow-gpu 2.2 is highest version available on anaconda channel at time of env creation)
- Tensorboard=2.3
- MLFlow=1.11
- Databricks-cli=0.9
- Dask=2.3
- Pillow=8
- Pandas Profiling
- Pyarrow
- Numpy
- Pandas
- Jupyterlab
- Matplotlib
- H5py
- Scikit-image
- Scikit-learn
- Scipy
Built from scratch to provide:
- FastAI=2.1
- Pytorch=1.6
- Torchvision
- TensorboardX=2.1
- MLFlow
- Databricks-cli
- Dask
- Pillow
- Pandas Profiling
- Pyarrow
- Numpy
- Pandas
- Jupyterlab
- Matplotlib
- H5py
- Scikit-image
- Scikit-learn
- Scipy
Built from scratch to provide:
- Tensorflow GPU 2.1
- Tensorboard 2.2
- Keras GPU 2.3
- MLFlow 1.11
- Databricks-cli
- Jupyterlab 2.2
- Numpy 1.19
- Pandas 1.1
- Matplotlib 3.3
- Pillow 8.0
Built from scratch to provide:
- Pytorch GPU 1.6
- MLFlow 1.11
- TensorboardX 2.1
- Jupyterlab 2.2
- Databricks-cli
- Numpy 1.19
- Pandas 1.1
- Torchvision 0.7
- Matplotlib 3.3
- Pillow 8.0
Rebuilt based on E020 to provide:
- Tensorflow GPU 2.1
- Pytorch GPU 1.3
- MLFlow
Based on a clone of E019 with alpha-vantage and qandl packages added, tensorflow 2.1.0 GPU and pytorch CPU.
Clone of E018 with wandb added via pip
Clone of E017 with update to all packages, and pytorch forced to version 1.4 resulting in CUDA tools to 10.1
Clone of E015 with packages to aid data acquisition and handling:
- datapackage see https://github.com/frictionlessdata/datapackage-py
- urllib3
- indexed_gzip
- beautifulsoup4
New freshly created ENV to allow TensorFlow 2 GPU to be used. It supplies:
- TensorFlow 2
- Pillow
- pyarrow
- pandas-profiling
- dask
- scikit-image
- scikit-learn
- scipy
Brand new (not cloned from other) environment with unconstrained versions, this expands on E014 and is currently working with Deoldify.
- Use pip install ffmpeg-python as the conda version is not working at time of writing
- jupyterlab
- pytorch
- matplotlib
- xlrd
- seaborn
- xlrd
- plotly
- numpy
- pandas-profiling
- pandas
- scikit-image
- scikit-learn
- scipy
- pyarrow
- pillow
- dask
- fastai
- pydotplus
- py-xgboost
- cufflinks-py
- keras-gpu
- nvidia-ml-py3
- ffmpeg
- opencv
- youtube-dl
- opencv
Brand new (not cloned from other) environment with unconstrained versions
- jupyter
- jupyterlab
- pytorch
- matplotlib
- xlrd
- seaborn
- xlrd
- plotly
- numpy
- pandas-profiling
- pandas
- scikit-image
- scikit-learn
- scipy
- pyarrow
- pillow
- dask
- fastai
- pydotplus
- py-xgboost
- pyarrow
- cufflinks-py
- keras-gpu
- Clone from E012
- pyarrow
- cufflinks-py
- Clone from E010
- pydotplus
- py-xgboost
- fastai
- update:
- dask 2.2 --> 2.3
Intended as a generic environment for basic Analytics and Data Science
- plotly
- numpy
- pandas
- seaborn
- matplotlib
- jupyterlab
- xlrd
- scikit-image
- scikit-learn
- pillow
- scipy
- dask
- tensorflow (cpu)
- pandas-profiling
- Clone from E009
- pandas-profiling 2.3
- update:
- tensorflow to 1.13.1 --> 1.14.0
- bokeh 1.2.0--> 1.3.4
- cudnn 7.3.1 --> 7.6.0
- dask 2.0 --> 2.2
- jupyterlab 0.35.6 --> 1.0.5
- numpy 1.16.3 --> 1.16.4
- pandas 0.24.2 --> 0.25.0
- scikit-learn 0.21.2 --> 0.21.3
- scipy 1.2.1 --> 1.3.0
- pillow 6.0.0 --> 6.1.0
- All from E008
- plotly
- jupyterlab
- All from E007
- dask
- All from E002
- xlrd
- scikit-image
- Python 3.6
- Seaborn
- xlrd
- Jupyter
- tensorflow=1.6
- pip imports not working so import E006 then run pip install opencv-contrib-python
- All from E002
- xlrd
- pip imports not working so import E005 then run pip install opencv-contrib-python
- All from E001
- xlrd
- All from E002
- pyro-ppl
- pip imports not working so import E003 then run pip install pyro-ppl
- All from E001
- Pytorch
- Seaborn
- Jupyter
- TF GPU