Take gradient accumulation into account when defining samplers #15095

sgugger · 2022-01-10T17:06:28Z

What does this PR do?

This PR takes the gradient accumulation steps into account when defining samplers that use the batch size (like the LengthGroupedSampler) so that training with large batch size or training with smaller batch size and gradient accumulation (e.g. batch size 64 or batch size 8 and gradient accumulation steps of 8) yield the same results.

Fixes #14638

LysandreJik

Good for me!

sgugger requested a review from LysandreJik January 10, 2022 17:06

sgugger added 2 commits January 10, 2022 12:06

Take gradient accumulation into account when defining samplers

b4b543c

style

ed4b6d5

LysandreJik approved these changes Jan 11, 2022

View reviewed changes

LysandreJik merged commit ca76618 into master Jan 11, 2022

LysandreJik deleted the batch_size_grad_acc branch January 11, 2022 08:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take gradient accumulation into account when defining samplers #15095

Take gradient accumulation into account when defining samplers #15095

sgugger commented Jan 10, 2022

LysandreJik left a comment

Take gradient accumulation into account when defining samplers #15095

Take gradient accumulation into account when defining samplers #15095

Conversation

sgugger commented Jan 10, 2022

What does this PR do?

LysandreJik left a comment

Choose a reason for hiding this comment