Slower multi-GPU training with DynamicBucketingSampler vs BucketingSampler #857
Unanswered
david20181
asked this question in
Q&A
Replies: 4 comments 7 replies
-
Thank you David, very cool observation!! I did not realize that. I'll look into adapting the implementation to follow your suggestion. |
Beta Was this translation helpful? Give feedback.
6 replies
-
@david20181 Have you solved this problem? |
Beta Was this translation helpful? Give feedback.
1 reply
-
This issue has been addressed in #1341 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When I switch from
BucketingSampler
toDynamicBucketingSampler
, training time increases for multi-GPU training.It seems that these two approaches draw from the duration buckets differently? Specifically, I’ve observed that when I use
BucketingSampler
, the GPU’s get batches from the same duration bucket for each step. I see different behavior forDynamicBucketingSampler
-- GPU’s seem to be getting batches from different duration buckets. As a result, some batches finish more quickly than others and GPU’s are idle while other batches finish.Would it be possible for
DynamicBucketingSampler
to have the same behavior asBucketingSampler
, i.e. that for multi-GPU training we draw from same duration bucket for each step?Beta Was this translation helpful? Give feedback.
All reactions