Skip to content

Need help on Micro Batch Size, Global Batch Size, Pipeline Parallel size calculation #6795

Answered by blahBlahhhJ
ngbala6 asked this question in Q&A
Discussion options

You must be logged in to vote

In short, TP_size * PP_size * DP_size == num_GPUs. so if you want to do PP=4, you cannot do DP.

For the equation there:

  • The left hand side is the number of micro batches in a global batch for one DP group, which mean the number of gradient accumulmations (the number of forward+backward before you step the optimizer).
  • The right hand side says global_batch_size // micro_batch_size (which means the number of micro batches for all DP group), divided by DP_size (which means the number of micro batches for one DP group).
    Therefore they should be equal.

Now that you understand the equation, the issue is probably because the DP_size is wrong.

A possible reason is that you didn't partition the m…

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@ashvinnihalani
Comment options

@ericharper
Comment options

@ashvinnihalani
Comment options

Answer selected by ericharper
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants