Chain of Hindsight in PyTorch & Huggingface Trainer

This is an unofficial implementation of Chain of Hindsight using PyTorch and Huggingface Trainer. The data loading script is directly taken from the original repo, and only the training part is re-written using PyTorch.

Installation

For pip,

pip install -r environment/requirements.txt

For conda,

conda env create -f environment/env.yml

Train

A shell script for training can be found in train.sh. It takes gpu device ids as inputs and passes it to CUDA_VISIBLE_DEVICES environment variable.

sh train.sh 0,1,2,3

To customize command line arguments, take a look at the arguments dataclasses used in the following files:

coh.coh_train.ExperimentArgs
coh.data.coh_data.CoHDataArgs
coh.trainer.CoHTrainArgs (this inherits from transformers.TrainingArguments)

Train LLaMA

Train script for LLaMA is also provided. The baseline script is:

sh llama_train.sh 0,1,2,3 ${LLAMA_PATH}

To use this script, you will need to have already downloaded LLaMa weights and converted it to pytorch weights using the convert script at huggingface transformers repo.

Relevant PR

python src/transformers/models/llama/convert_llama_weights_to_hf.py \
    --input_dir /path/to/downloaded/llama/weights \
    --model_size 7B \
    --output_dir /output/path

DeepSpeed

To use DeepSpeed, you need nvcc with the correct version installed. Conda provides cuda-nvcc package, which is also included in env.yml. However, to use this, you need to set the CUDA_HOME environment variable to point to the conda environment (this is required for deepspeed JIT c++ compiler to point to the conda installed nvcc not the system-wide one). after creating the environment and activating it, set

export CUDA_HOME=/path/to/conda/envs/coh

Example deepspeed config files can be found in ds_config. They are directly taken from huggingface's deepspeed integration tutorial.

By default, train.sh uses deepspeed. llama_train.sh uses FSDP instead.

PEFT

To further enhance efficiency of training, PEFT lora is applied. Pass --use_lora into training arguments.

8-bit training

You can also use 8-bit training!

This is compatible with PEFT.
This is NOT compatible with DeepSpeed.
- Need to use torchrun launcher instead of deepspeed launcher.

Notice

This repo diverges from the original repo's implementation in a few ways:

The original repo does not have evaluation step.
Here, no bos_token is prepended to the input_ids. This is because since the batching logic is chunk-wise, each sentence in a batch is not really a sentence.
No weight_decay_mask is used.
Forgetful Causal Masking is not applied.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
coh		coh
ds_config		ds_config
environment		environment
.gitignore		.gitignore
README.md		README.md
data_test.ipynb		data_test.ipynb
llama_train.sh		llama_train.sh
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chain of Hindsight in PyTorch & Huggingface Trainer

Installation

Train

Train LLaMA

DeepSpeed

PEFT

8-bit training

Notice

About

Releases

Packages

Languages

syncdoth/Chain-of-Hindsight-PyTorch

Folders and files

Latest commit

History

Repository files navigation

Chain of Hindsight in PyTorch & Huggingface Trainer

Installation

Train

Train LLaMA

DeepSpeed

PEFT

8-bit training

Notice

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages