[`fix`] Fix gradient checkpointing to allow for much lower memory usage #2717

tomaarsen · 2024-06-05T10:38:15Z

Hello!

Pull Request overview

Propagate the gradient checkpointing enabling to the underlying Transformer model

Details

By setting gradient_checkpointing=True, you can save a lot of memory when training at a ~10-20% speed decrease. This can allow for bigger batch sizes, which is very beneficial for in-batch negatives training.

Thanks @philschmid for reporting this.

Tom Aarsen

…former model

Propagate the gradient checkpointing enabling to the underlying Trans…

f1787ae

…former model

tomaarsen merged commit 73fe054 into UKPLab:master Jun 5, 2024
9 checks passed

tomaarsen deleted the feat/gradient_checkpointing branch June 5, 2024 11:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`fix`] Fix gradient checkpointing to allow for much lower memory usage #2717

[`fix`] Fix gradient checkpointing to allow for much lower memory usage #2717

tomaarsen commented Jun 5, 2024

[fix] Fix gradient checkpointing to allow for much lower memory usage #2717

[fix] Fix gradient checkpointing to allow for much lower memory usage #2717

Conversation

tomaarsen commented Jun 5, 2024

Pull Request overview

Details

[`fix`] Fix gradient checkpointing to allow for much lower memory usage #2717

[`fix`] Fix gradient checkpointing to allow for much lower memory usage #2717