This is a collection of scripts for setting up an environment for RAPIDS development and notebooks starting from a clean install of Ubuntu 20.04 or 22.04. It automates the following steps:
- Installation of dependencies:
- Clangd - a language server used to add smart features to an IDE, e.g. VSCode.
- Visual Studio Code - an IDE.
- Docker - A container technology for packaging up code with all its dependencies.
docker-compose
- A tool for building, configuring, and running groups of docker containers- NVIDIA Container Toolkit - For building GPU-accelerated docker containers.
- Creation of a
docker-compose
environment:- Contains suitable settings of compiler and Python versions and other build parameters.
- Creation of a VSCode workspace:
- Sets up syntax highlighting, intellisense for C++, CUDA and Python
- Set up some Git conveniences.
- Forking and cloning of the RAPIDS repositories:
- Creation of containers for development work:
dind
: The Docker-in-Docker container. This container is built first to provide a clean environment for runningdocker-compose
in to build the other containers. This allows us to mount your local filesystem as docker volumes while selectively ignoring folders that bloat the docker build cache and slow down container builds.rapids
: Contains builds of all the RAPIDS repositories listed above and tooling to support development with Visual Studio Code.notebooks
: Serves Jupyter notebooks using the RAPIDS libraries built by therapids
container. This allows you to easily test out new features or fixes to C++ or Python in any of our example demo notebooks.
Starting with the assumption that this repository has not yet been cloned, and no other dependencies are installed, these steps:
- Install all dependencies,
- Fork and clone the RAPIDS repositories,
- Set up the VSCode environment and Intellisense.
During execution, you will be asked if you would like to install various dependencies - it is generally a sensible and safe to answer "yes" to these questions.
For the forking of Github repositories, you will also be asked for Github login details so that the script can use the Github CLI to fork from the RAPIDS Github account to your account.
# 1. Create a directory for all the RAPIDS workspace (any directory is OK)
mkdir -p ~/dev/rapids && cd ~/dev/rapids
# 2. Install Makefile and script dependencies:
sudo apt install curl jq make
# 3. Check out the rapids-compose repo into ./compose
git clone https://github.com/trxcllnt/rapids-compose.git compose
cd compose
# 4. Initialize the local compose environment:
make init
You will be asked to reboot if make init
installed docker
for you.
This is a living repo and will evolve along with the RAPIDS projects. It's a good idea to periodically pull the latest version of the repository to ensure your dev workflow is kept up-to-date with any changes to the RAPIDS projects' environments:
make init
is idempotent, so it's safe to run it from scratch as many times as
necessary. It will ask permission before installing any programs or modifying
any local files, so don't worry about losing any local modifications to your
VSCode workspace or .env
files.
# `cd` to the directory where you checked out this repo
cd ~/dev/rapids/compose
# pull the latest changes
git pull origin master
# re-run `make init` to ensure any new settings are applied
make init
To build the dev and notebook containers:
# 1. Build the RAPIDS dev and notebook containers
# a. Only build the rapids container: `make rapids.build`
# b. Only build the notebook container: `make notebooks.build`
make rapids.build
make notebooks.build
This only needs to be done once to create the containers. During normal development, RAPIDS components can be rebuilt inside the dev container.
make
is the "easy button" to build the containers and compile each of
the RAPIDS projects from source.
If you want to rebuild your containers (e.g. after a docker pull nvidia/cuda
,
or modifing the container Dockerfiles
), or if you want to take a coffee break
and come back to a fully re-built set of RAPIDS projects, it's always safe to
docker rm -f
any existing rapids containers and re-run make
from top.
To run the rapids container and launch a tty to explore it interactively, run:
make rapids.run
To launch the notebook container (needed each time the system is started):
make notebooks.run
See the header for etc/bash-utils.sh for the suite of useful commands to run inside the dev container.
"I changed something in my .env, restarted the container and now conda is spending hours resolving conflicts. What do I do?"
rapids-compose
builds a combined conda environment from the environments of all the RAPIDS
repos (cuDF, cuML, cuGraph, RMM, etc.). So if one of those repos on your system is out of sync,
there can be conflicts. To solve this:
-
ensure you have all repos pulled to the same branch. If you are working on a feature branch, you may need to merge the latest from the base branch into your feature branch.
-
compose/scripts/git-checkout-same-branch.sh
is a script that automates checking out all of your repos to the same branch. It examines the repos and prompts with a choice of common branches. -
If you are using a CUDA toolkit release candidate or prerelease (e.g. CUDA 11.0 RC), there may not be conda packages for this available yet. In this case, you can still build RAPIDS libraries (e.g. libcudf.so) from source, but you may not be able to build and use the Python bindings/libraries. In this situation, in the
compose/.env
file, you need to setCUDA_VERSION
to the version of the CUDA toolkit you want to use, but setCONDA_CUDA_TOOLKIT_VERSION
to the latest earlier version for which conda packages exist. So in the case of working with CUDA 12.0 RC, setCUDA_VERSION=12.0.0 CONDA_CUDA_TOOLKIT_VERSION=11.8
When creating the Docker-in-Docker container, downloading of the Alpine package release key fails with the following error:
+ wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://alpine-pkgs.sgerrand.com/sgerrand.rsa.pub
wget: bad address 'alpine-pkgs.sgerrand.com'
This is a DNS resolution failure in the container - there can be numerous causes
for DNS resolution failures, but one that can occur within the NVIDIA
environment is the Cisco AnyConnect client breaking networking for containers.
If this occurs, a possible resolution is to restart the machine and attempt
make
again prior to making any VPN connections.
Alternatively, you can use the Cisco-compatible OpenConnect VPN client, which
seems to have better integration with systemd
:
sudo apt install network-manager-openconnect-gnome
When building the RAPIDS container, an error running groupadd
is encountered
just after the build of ccache
:
...
INSTALL ccache
INSTALL ccache.1
/
groupadd: GID '0' already exists
This occurs if you are running make
as root. You should run it under your
normal user account.
Inspired by container development patterns from @MillerHooks