New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Deepspeed integration for 7B models #19

Merged

vwxyzjn merged 15 commits into main from 7bmodel

Sep 17, 2023

Owner

vwxyzjn commented Sep 6, 2023 •

edited

Loading

This PR attempts to bring deepspeed integration to empower tuning with 7B models. In the summarize from human feedback paper, the experimented with 1.3B, 2.7B, and 6.7B models, so this PR would in principle allow us to replicate that work.

Some of the notable changes needed to make things work:

mixed_precision: 'bf16' turns out to be important, otherwise OOM.
Initialize all models on CPU first and use accelerator.prepare and deepspeed.initialize, otherwise OOM.
Enable bf16 for reward_model and ref_policy, otherwise OOM.
Do not log the histogram of ratio, otherwise OOM.
Enable gradient checkpointing, otherwise OOM.
In https://github.com/OpenLMLab/MOSS-RLHF/blob/40b91eb2f2b71b16919addede0341d2bef70825d/utils.py#L41-L43, they have an additional critic_model, which they finally offload reward_model, critic_model, and ref_policy to CPU, but it is not necessary in our case.

Here is a training run https://wandb.ai/costa-huang/cleanRL/runs/kve7tu43/overview with

accelerate launch --config_file deepspeed.yaml lm_human_preference_details/train_policy_accelerate.py \
    --rewards.trained_model ''  \
    --base_model tiiuae/falcon-7b  \
    --no_use_tensorflow_adam \
    --ppo.gradient_accumulation_steps 64 \
    --track

Training results was pretty bad, but I think this is probably some issue related to model compatibility. To replicate summarize from human feedback paper, we should probably use the OPT models which have 1.3B, 2.7B, and 6.7B models.

vwxyzjn added 3 commits

September 6, 2023 13:16


          Deepspeed integration for 7B models

4960b93


          bug fix: deal with last batch not being true

aa07049


          quick change

e621bd6

lewtun reviewed

View reviewed changes

lm_human_preference_details/train_policy_accelerate.py Outdated

+                      deepspeed_states = AcceleratorState().deepspeed_plugin
+                      deepspeed_states.deepspeed_config['train_micro_batch_size_per_gpu'] = args.ppo.local_micro_batch_size
+                      deepspeed_states.deepspeed_config['checkpoint'] = {'use_node_local_storage': True}
+                      off_load_device = "cpu"

lewtun Sep 12, 2023 •

edited

Loading

Note that this will slow down your code significantly. I would allow the option to be set as an option that's inferred from the accelerate config as I did here: huggingface/trl#758

lm_human_preference_details/train_policy_accelerate.py Outdated

+                      deepspeed_states = AcceleratorState().deepspeed_plugin
+                      deepspeed_states.deepspeed_config['train_micro_batch_size_per_gpu'] = args.ppo.local_micro_batch_size
+                      deepspeed_states.deepspeed_config['checkpoint'] = {'use_node_local_storage': True}

lewtun Sep 12, 2023

I think this flag is only needed if each node has a separate local filesystem. For the HFC case, you probably don't need it

lm_human_preference_details/train_policy_accelerate.py Outdated

+                      import deepspeed
+                      deepspeed_states = AcceleratorState().deepspeed_plugin
+                      deepspeed_states.deepspeed_config['train_micro_batch_size_per_gpu'] = args.ppo.local_micro_batch_size

lewtun Sep 12, 2023

If I'm not mistaken, these config values are set automatically by the accelerator and don't need to be overridden

lm_human_preference_details/train_policy_accelerate.py

+                          "bf16": {
+                              "enabled": True
+                          },
+                          "prescale_gradients": False,

lewtun Sep 12, 2023

I think this flag and the one below are false by default, so probably don't need to be set either

lm_human_preference_details/train_policy_accelerate.py

@@ @@ -755,7 +790,8 @@ def train(args: Args): @@
                                   )
                       with torch.no_grad():
-                          writer.add_histogram("ppo/val/ratio_hist", ratio, update)
+                          if not args.deepspeed: # for some reason there is a OOM with the `writer.add_histogram`
+                              writer.add_histogram("ppo/val/ratio_hist", ratio, update)

lewtun Sep 12, 2023

FYI I was able to train 7B models in TRL with ZeRO-2 and didn't need to remove the histogram. On the other hand that was for sentiment tuning, which is less memory intensive than your application here

vwxyzjn added 7 commits

September 12, 2023 15:57


          work with pythia models

7d01c79


          do not hardcode grad accu

876d5ba


          better logging

07b8169


          Merge branch 'main' into 7bmodel

f0c50f3


          precommit

a57283f


          Merge branch '7bmodel' of https://github.com/vwxyzjn/lm-human-prefere…

5de1035

…nce-details into 7bmodel


          Create pre-commit.yml

110a9b7

vwxyzjn marked this pull request as ready for review

September 16, 2023 21:53

vwxyzjn added 5 commits

September 16, 2023 17:56


          pre-commit

51db598


          pre-commit

935acbf


          work with 7b again

77fca43


          quick fix

1c5c4ef


          pre-commit

6e3485f

Owner Author

vwxyzjn commented Sep 17, 2023

Confirmed that it can reasonably run 7b models (no benchmark results yet)


SAVE_PATH_REWARD="models/train_7b_$(date +%s)/reward.pt"
SAVE_PATH_POLICY="models/train_7b_$(date +%s)/policy.pt"
poetry run accelerate launch --config_file deepspeed.yaml  lm_human_preference_details/train_reward_accelerate.py \
    --base_model cerebras/Cerebras-GPT-6.7B \
    --no_use_tensorflow_adam \
    --gradient_accumulation_steps=4 \
    --local_rollout_batch_size=4 \
    --save_path=$SAVE_PATH_REWARD \
    --track && \
    poetry run accelerate launch --config_file deepspeed.yaml  lm_human_preference_details/train_policy_accelerate.py \
    --rewards.trained_model=$SAVE_PATH_REWARD \
    --base_model=cerebras/Cerebras-GPT-6.7B \
    --deepspeed \
    --no_use_tensorflow_adam \
    --ppo.gradient_accumulation_steps 64 \
    --track

https://wandb.ai/costa-huang/cleanRL/runs/hn9wtka9?workspace=user-costa-huang

vwxyzjn merged commit 48f709d into main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet