Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Temporary bug fix: chunk teacher x distributed training x dynamic batching #3382

Merged
merged 4 commits into from
Jan 15, 2021

Conversation

emilydinan
Copy link
Contributor

@emilydinan emilydinan commented Jan 14, 2021

Patch description
See #3367. This is a temporary fix to unblock folks using this setting. Since the bug currently cannot be reproduced with fewer than 16 gpus during distributed training, it will take some time to debug.

Testing steps

parlai tm -t babi -m transformer/generator -bs 20 --dynamic-batching full --eval-dynamic-batching off -mf /tmp/testdynbatch --truncate 30 -vtim 5 -vme 100

Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help string suggestion

parlai/scripts/train_model.py Outdated Show resolved Hide resolved
Emily Dinan and others added 2 commits January 14, 2021 17:17
Co-authored-by: Stephen Roller <roller@fb.com>
@emilydinan emilydinan merged commit 837741d into master Jan 15, 2021
@emilydinan emilydinan deleted the chunkxdynbatchxdistributed_tempfix branch January 15, 2021 17:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants