You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)
Reproduction
Import T5 encoder-decoder model.
use accelerator.prepare() on model and evaluation data loader for parallel evaluation.
for data in eval_dataloader:
out = model.generate()
accelerator.gather(out)
Expected behavior
It's expected to gather the generated token ids (tensor without freezing). I also tried syncd_gpu=True in generate function but it doesn't work. I also tried to insert accelerator.wait_for_everyone() in many places and it's not working neither. Note that if I use model.generate() by itself, it's working. When it's used with accelerator.gather(out), it gets freezed after few steps.
The text was updated successfully, but these errors were encountered:
It's very likely that each process is generating outputs of different lengths, so you are stuck because accelerator.gather doesn't like that. You can use accelerator.pad_accross_processes to pad your outputs to the same size prior to gathering.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Expected behavior
It's expected to gather the generated token ids (tensor without freezing). I also tried
syncd_gpu=True
ingenerate
function but it doesn't work. I also tried to insertaccelerator.wait_for_everyone()
in many places and it's not working neither. Note that if I usemodel.generate()
by itself, it's working. When it's used withaccelerator.gather(out)
, it gets freezed after few steps.The text was updated successfully, but these errors were encountered: