Benchmark results for DeepSeek-v3 in 2x8xH200 Cluster #2738

roG0d · 2025-01-05T16:46:11Z

Motivation

Having baseline metrics to compare future works using a 2x8xH200 GPU cluster.
Explore the tradeoffs of increasing the number of chips with more GPU memory, H200, versus increasing the parallel inference world size when H100.
Measure multi-node inference overhead compared to single-node.
Explore the benefits of using FP8 quantization.

For output files and logs, please refer to: https://github.com/datacrunch-research/h200-benchmarks

Modifications

Added a new folder benchmark_dsv3 resembling a similar structure of other benchmark folders.
Added deepseek_v3.shscript containing each benchmark performed.
Added a README.md containing the metrics obtained from the benchmarks performed.

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

zhyncs · 2025-01-05T17:49:48Z

Hi @roG0d Sorry for the late response, may you try the latest version v0.4.1.post4

roG0d · 2025-01-07T09:25:39Z

Sure, @zhyncs! If you'd like, we were thinking it might be useful to track the progress made in #2591 and benchmark it for single-node FP8 and BF16.

We could create another folder called benchmark_v0_4_1_post to store these benchmark results, similar to what was done for this PR.

roG0d · 2025-01-07T11:12:16Z

We already have the results for v.0.4.1.post4 up to FusedMoE tuning for H200 here. Would you prefer us to create a new PR for v0.4.1.post4 with these results or should I include it in the current one for main so we can continue updating it with future optimizations?

roG0d and others added 11 commits January 2, 2025 09:43

Included Multinode DeepSeekv3

7c6e609

Reincluded H20 example

5b809e6

Updated --nccl-init for --dist-init-addr

640b41c

Merge branch 'main' into main

1eaf209

upd

9d8c2b4

upd

2770fe9

upd

438cf62

upd

92b4911

Merge branch 'sgl-project:main' into main

3d2d50a

(ADDED): benchmarks results for deepseekv3

5035232

Update link to results

62ef626

Merge branch 'sgl-project:main' into main

ec759fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark results for DeepSeek-v3 in 2x8xH200 Cluster #2738

Benchmark results for DeepSeek-v3 in 2x8xH200 Cluster #2738

roG0d commented Jan 5, 2025

zhyncs commented Jan 5, 2025

roG0d commented Jan 7, 2025

roG0d commented Jan 7, 2025

Benchmark results for DeepSeek-v3 in 2x8xH200 Cluster #2738

Are you sure you want to change the base?

Benchmark results for DeepSeek-v3 in 2x8xH200 Cluster #2738

Conversation

roG0d commented Jan 5, 2025

Motivation

Modifications

Checklist

zhyncs commented Jan 5, 2025

roG0d commented Jan 7, 2025

roG0d commented Jan 7, 2025