h2oGPT Installation Help

Follow these instructions to get a working Python environment on a Linux system.

Install Python environment

Download Miniconda, for Linux or MACOS Miniconda or Windows Miniconda. Then, install conda and setup environment:

bash ./Miniconda3-py310_23.1.0-1-Linux-x86_64.sh  # for linux x86-64
# follow license agreement and add to bash if required

Enter new shell and should also see (base) in prompt. Then, create new env:

conda create -n h2ogpt -y
conda activate h2ogpt
conda install -y mamba -c conda-forge  # for speed
mamba install python=3.10 -c conda-forge -y
conda update -n base -c defaults conda -y

You should see (h2ogpt) in shell prompt. Test your python:

python --version

should say 3.10.xx and:

python -c "import os, sys ; print('hello world')"

should print hello world. Then clone:

git clone https://github.com/h2oai/h2ogpt.git
cd h2ogpt

Then go back to README for package installation and use of generate.py.

Installing CUDA Toolkit

E.g. CUDA 12.1 install cuda coolkit

E.g. for Ubuntu 20.04, select Ubuntu, Version 20.04, Installer Type "deb (local)", and you should get the following commands:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2004-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

Then set the system up to use the freshly installed CUDA location:

echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:/usr/local/cuda/lib64/" >> ~/.bashrc
echo "export CUDA_HOME=/usr/local/cuda" >> ~/.bashrc
echo "export PATH=\$PATH:/usr/local/cuda/bin/" >> ~/.bashrc
source ~/.bashrc
conda activate h2ogpt

Then reboot the machine, to get everything sync'ed up on restart.

sudo reboot

Compile bitsandbytes

For fast 4-bit and 8-bit training, one needs bitsandbytes. Compiling bitsandbytes is only required if you have different CUDA than built into bitsandbytes pypi package, which includes CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 12.0, 12.1. Here we compile for 12.1 as example.

git clone http://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
git checkout 7c651012fce87881bb4e194a26af25790cadea4f
CUDA_VERSION=121 make cuda12x
CUDA_VERSION=121 python setup.py install
cd ..

Install nvidia GPU manager if have multiple A100/H100s.

sudo apt-key del 7fa2af80
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get install -y datacenter-gpu-manager
sudo apt-get install -y libnvidia-nscq-530
sudo systemctl --now enable nvidia-dcgm
dcgmi discovery -l

See GPU Manager

Install and run Fabric Manager if have multiple A100/100s

sudo apt-get install cuda-drivers-fabricmanager
sudo systemctl start nvidia-fabricmanager
sudo systemctl status nvidia-fabricmanager

See Fabric Manager

Once have installed and reboot system, just do:

sudo systemctl --now enable nvidia-dcgm
dcgmi discovery -l
sudo systemctl start nvidia-fabricmanager
sudo systemctl status nvidia-fabricmanager

Tensorboard (optional) to inspect training

tensorboard --logdir=runs/

Flash Attention

Update: this is not needed anymore, see h2oai#128

To use flash attention with LLaMa, need cuda 11.7 so flash attention module compiles against torch.

E.g. for Ubuntu, one goes to cuda toolkit, then:

wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run
sudo bash ./cuda_11.7.0_515.43.04_linux.run

Then No for symlink change, say continue (not abort), accept license, keep only toolkit selected, select install.

If cuda 11.7 is not your base installation, then when doing pip install -r requirements.txt do instead:

CUDA_HOME=/usr/local/cuda-11.7 pip install -r reqs_optional/requirements_optional_flashattention.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSTALL.md

INSTALL.md

h2oGPT Installation Help

Install Python environment

Installing CUDA Toolkit

Compile bitsandbytes

Install nvidia GPU manager if have multiple A100/H100s.

Install and run Fabric Manager if have multiple A100/100s

Tensorboard (optional) to inspect training

Flash Attention

Files

INSTALL.md

Latest commit

History

INSTALL.md

File metadata and controls

h2oGPT Installation Help

Install Python environment

Installing CUDA Toolkit

Compile bitsandbytes

Install nvidia GPU manager if have multiple A100/H100s.

Install and run Fabric Manager if have multiple A100/100s

Tensorboard (optional) to inspect training

Flash Attention