Place metric functions for BLEU and Rogue on correct devices when using multiple GPUs #3671

arnavgarg1 · 2023-09-27T14:52:33Z

The issue was that the metric function wasn't being moved/placed on the right device, leading to a weird behavior for these metrics using the response prediction key because the inputs that are passed in are not tensors, they're lists of strings. However, it seems like these metric functions need to moved to CUDA (instead of staying on the CPU) so that when the metric_fn.compute() call is called to gather evaluation metric summaries, it does not run into this error:

RuntimeError: Tensors must be CUDA and dense

Tested successfully with:

Only CPU machine
Multi-GPU Quantized training (4 GPUs)
Multi-GPU DeepSpeed Stage 3 training (4 GPUs)

…i-gpu training

for more information, see https://pre-commit.ci

github-actions · 2023-09-27T15:50:58Z

Unit Test Results

      6 files ±0       6 suites ±0 53m 16s ⏱️ + 2m 5s
2 807 tests ±0 2 793 ✔️ ±0 12 💤 ±0 2 ❌ ±0
2 847 runs ±0 2 824 ✔️ ±0 21 💤 ±0 2 ❌ ±0

For more details on these failures, see this check.

Results for commit e7d0f6f. ± Comparison against base commit 4af5331.

arnavgarg1 and others added 3 commits September 27, 2023 14:43

Place metric functions for BLEU and Rogue on correct devices for mult…

875f16e

…i-gpu training

[pre-commit.ci] auto fixes from pre-commit.com hooks

09b7e4c

for more information, see https://pre-commit.ci

Fix comment

e7d0f6f

arnavgarg1 requested a review from justinxzhao September 27, 2023 18:05

justinxzhao approved these changes Sep 27, 2023

View reviewed changes

arnavgarg1 merged commit 1286123 into master Sep 27, 2023

arnavgarg1 deleted the dist_text_metrics branch September 27, 2023 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Place metric functions for BLEU and Rogue on correct devices when using multiple GPUs #3671

Place metric functions for BLEU and Rogue on correct devices when using multiple GPUs #3671

arnavgarg1 commented Sep 27, 2023

github-actions bot commented Sep 27, 2023

Place metric functions for BLEU and Rogue on correct devices when using multiple GPUs #3671

Place metric functions for BLEU and Rogue on correct devices when using multiple GPUs #3671

Conversation

arnavgarg1 commented Sep 27, 2023

github-actions bot commented Sep 27, 2023

Unit Test Results