Memory, tokens per sec, and MFU behavior in train_gpt2cu #503
chinthysl
started this conversation in
Show and tell
Replies: 1 comment
-
Very cool thank you for posting! I am really eager to also get a multi-node setup sometime soon for myself to run similar things. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Following results are acquired to get a better understanding of training in a single node before expanding into multiple nodes.
I generated the results in a single DGX H100 node using all 8 GPUs.
Total batch size is set to 524288. Varied the model from D12 to D48 and batch size from 2 until GPU memory max out.
Major insights,
Sample script I used to sweep the results,
Beta Was this translation helpful? Give feedback.
All reactions