Clarify batch size displayed when using DataParallel #24430

sgugger · 2023-06-22T17:30:10Z

What does this PR do?

As pointed out in #24345, the batch size displayed when using DataParallel is unclear, this PR fixes that.

Fixes #24345

HuggingFaceDocBuilderDev · 2023-06-22T17:48:06Z

The documentation is not available anymore as the PR was closed or merged.

muellerzr

Good call, much clearer now!

cgbahk · 2023-06-23T02:21:59Z

src/transformers/trainer.py

-        logger.info(f"  Instantaneous batch size per device = {self._train_batch_size:,}")
+        logger.info(f"  Instantaneous batch size per device = {self.args.per_device_train_batch_size:,}")
+        if self.args.per_device_train_batch_size != self._train_batch_size:
+            logger.info(f"  Training with DataParallel so batch size has been adjusted to: {self._train_batch_size:,}")
        logger.info(f"  Total train batch size (w. parallel, distributed & accumulation) = {total_train_batch_size:,}")


Thanks for care! #24345 seems resolved.

Sorry, I wrote below, but it seems not the case 😅 I didn't check gradient accumulation

So please ignore below

As I said #24345 (comment), I will not use DP anymore, but it seems to have bug around this logging part in case of DP.

e.g.) total_train_batch_size seems be larger than expected real value..?

May use

total_train_batch_size = self.args.per_device_train_batch_size * args.gradient_accumulation_steps * args.world_size

at least in case of DP, instead of

transformers/src/transformers/trainer.py

Line 1665 in 8f093fb

total_train_batch_size = args.train_batch_size * args.gradient_accumulation_steps * args.world_size

..?

I'm not sure about this issue and my suggestion (as I'm not familiar yet with trainer core/internal) so by now I will not report any issue or make a PR, but you guys may care about this.

I would try to make new issue or PR if I have enough time for check 😄

Clarify batch size displayed when using DataParallel

a007f01

sgugger requested a review from muellerzr June 22, 2023 17:30

muellerzr approved these changes Jun 22, 2023

View reviewed changes

sgugger merged commit 2834c17 into main Jun 22, 2023

sgugger deleted the trainer_dp_bs branch June 22, 2023 18:46

cgbahk reviewed Jun 23, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify batch size displayed when using DataParallel #24430

Clarify batch size displayed when using DataParallel #24430

sgugger commented Jun 22, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 22, 2023 •

edited

Loading

muellerzr left a comment

cgbahk Jun 23, 2023 •

edited

Loading

Clarify batch size displayed when using DataParallel #24430

Clarify batch size displayed when using DataParallel #24430

Conversation

sgugger commented Jun 22, 2023 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Jun 22, 2023 • edited Loading

muellerzr left a comment

Choose a reason for hiding this comment

cgbahk Jun 23, 2023 • edited Loading

Choose a reason for hiding this comment

sgugger commented Jun 22, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 22, 2023 •

edited

Loading

cgbahk Jun 23, 2023 •

edited

Loading