Temporary bug fix: chunk teacher x distributed training x dynamic batching #3382

emilydinan · 2021-01-14T21:46:23Z

Patch description
See #3367. This is a temporary fix to unblock folks using this setting. Since the bug currently cannot be reproduced with fewer than 16 gpus during distributed training, it will take some time to debug.

Testing steps

parlai tm -t babi -m transformer/generator -bs 20 --dynamic-batching full --eval-dynamic-batching off -mf /tmp/testdynbatch --truncate 30 -vtim 5 -vme 100

stephenroller

Help string suggestion

parlai/scripts/train_model.py

Co-authored-by: Stephen Roller <roller@fb.com>

temp fix

951e704

facebook-github-bot added the CLA Signed label Jan 14, 2021

emilydinan requested a review from stephenroller January 14, 2021 21:46

laksdfj

6b3ba4d

stephenroller approved these changes Jan 14, 2021

View reviewed changes

parlai/scripts/train_model.py Outdated Show resolved Hide resolved

Emily Dinan and others added 2 commits January 14, 2021 17:17

Update parlai/scripts/train_model.py

385773b

Co-authored-by: Stephen Roller <roller@fb.com>

lint

c4712bc

emilydinan merged commit 837741d into master Jan 15, 2021

emilydinan deleted the chunkxdynbatchxdistributed_tempfix branch January 15, 2021 17:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporary bug fix: chunk teacher x distributed training x dynamic batching #3382

Temporary bug fix: chunk teacher x distributed training x dynamic batching #3382

emilydinan commented Jan 14, 2021 •

edited

Loading

stephenroller left a comment

Temporary bug fix: chunk teacher x distributed training x dynamic batching #3382

Temporary bug fix: chunk teacher x distributed training x dynamic batching #3382

Conversation

emilydinan commented Jan 14, 2021 • edited Loading

stephenroller left a comment

Choose a reason for hiding this comment

emilydinan commented Jan 14, 2021 •

edited

Loading