Skip to content
This repository has been archived by the owner on Oct 16, 2023. It is now read-only.

hpcaitech/GPT-Demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Run GPT With Colossal-AI

How to Prepare Webtext Dataset

You can download the preprocessed sample dataset for this demo via our Google Drive sharing link.

Run this Demo

Use the following commands to install prerequisites.

# assuming using cuda 11.3
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
pip install colossalai==0.1.9+torch1.11cu11.3 -f https://release.colossalai.org

Use the following commands to execute training.

#!/usr/bin/env sh
export DATA=/path/to/small-gpt-dataset.json'

# run on a single node
colossalai run --nproc_per_node=<num_gpus> train_gpt.py --config configs/<config_file> --from_torch

# run on multiple nodes with slurm
colossalai run --nproc_per_node=<num_gpus> \
   --master_addr <hostname> \
   --master_port <port-number> \
   --hosts <list-of-hostname-separated-by-comma> \
   train_gpt.py \
   --config configs/<config_file> \
   --from_torch

# run on multiple nodes with slurm
srun python \
   train_gpt.py \
   --config configs/<config_file> \
   --host <master_node>
   

You can set the <config_file> to any file in the configs folder. To simply get it running, you can start with gpt_small_zero3_pp1d.py on a single node first. You can view the explanations in the config file regarding how to change the parallel setting.

About

GPT Demo with hybrid distributed training

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages