Add support for gradient checkpointing for LLM fine-tuning #3613

arnavgarg1 · 2023-09-15T00:37:21Z

This PR adds support in the finetune trainer to optionally enabled gradient_checkpointing.

What is gradient checkpointing?

Gradient checkpointing works by recomputing the activations of the model during the backward pass, rather than storing them in memory during the forward pass. This is a tradeoff between compute and memory, as the activations need to be recomputed during the backward pass, but the memory footprint is reduced. This is set to false by default because it is not always beneficial to use gradient checkpointing, and it can sometimes slow down training.

How can you use gradient checkpointing in the config?

Gradient checkpointing is disabled by default. To enable it, you can simply set enable_gradient_checkpointing to True.

trainer:
    enable_gradient_checkpointing: true

When should I enable gradient checkpointing?

This is useful when training very large models that run into out of memory errors very quickly during training. It is particularly helpful when doing non-quantization based training (adapter based or full fine-tuning).

github-actions · 2023-09-15T01:40:00Z

Unit Test Results

  6 files ±0   6 suites ±0 39m 8s ⏱️ +45s
31 tests ±0 26 ✔️ ±0   5 💤 ±0 0 ❌ ±0
82 runs ±0 66 ✔️ ±0 16 💤 ±0 0 ❌ ±0

Results for commit 1b3fe04. ± Comparison against base commit 6198693.

justinxzhao

Nice description! Do you have a sense for which LLMs support gradient checkpointing and which ones don't?

ludwig/trainers/trainer.py

arnavgarg1 · 2023-09-15T16:56:45Z

Nice description! Do you have a sense for which LLMs support gradient checkpointing and which ones don't?

I think almost all models coming from the transformer package support it, but I have the additional check here just in case there is an edge case that hasn't been handled.

arnavgarg1 added 13 commits September 14, 2023 18:20

Set requires_grad=False for base model when using adapters

b13bedb

Add logs

5e3ab3a

Add gradient checkpointing

952147e

Fix nested calls

2e06606

Disable GrC, only enable input require grads

1fb21c5

Disable GrC, only enable input require grads

0bbb982

Re-enable enable_input_require_grads

d5cf550

Add enable_gradient_checkpointing to trainer schema

04c1994

Add test for ECD'

e37cd45

Update metadata description

a9368bd

Update trainer.yaml

16d5dfa

minor nit

9a62522

Uncomment code

1b9b931

arnavgarg1 requested review from justinxzhao, tgaddair, jeffkinnison, w4nderlust and Infernaught September 15, 2023 00:41

arnavgarg1 added 2 commits September 15, 2023 00:51

Typos

7885984

Fix test

1b3fe04

justinxzhao approved these changes Sep 15, 2023

View reviewed changes

ludwig/trainers/trainer.py Show resolved Hide resolved

ludwig/trainers/trainer.py Show resolved Hide resolved

arnavgarg1 merged commit 3edfb84 into master Sep 15, 2023

arnavgarg1 deleted the lora_grad branch September 15, 2023 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for gradient checkpointing for LLM fine-tuning #3613

Add support for gradient checkpointing for LLM fine-tuning #3613

arnavgarg1 commented Sep 15, 2023 •

edited

Loading

github-actions bot commented Sep 15, 2023

justinxzhao left a comment

arnavgarg1 commented Sep 15, 2023

Add support for gradient checkpointing for LLM fine-tuning #3613

Add support for gradient checkpointing for LLM fine-tuning #3613

Conversation

arnavgarg1 commented Sep 15, 2023 • edited Loading

What is gradient checkpointing?

How can you use gradient checkpointing in the config?

When should I enable gradient checkpointing?

github-actions bot commented Sep 15, 2023

Unit Test Results

justinxzhao left a comment

Choose a reason for hiding this comment

arnavgarg1 commented Sep 15, 2023

arnavgarg1 commented Sep 15, 2023 •

edited

Loading