Skip to content

different TFLOPS with NVIDIA's document. #486

Answered by rasbt
KK666-AI asked this question in Q&A
Discussion options

You must be logged in to vote

I did some more research on this, and I believe these are the correct numbers:

Precision H100 SXM (80 GB) H100 PCIe (80 GB) H100 NVL (188 GB)
FP64 (Double-Precision) 34 TFLOPS 26 TFLOPS 68 TFLOPS
FP64 Tensor Core 67 TFLOPS 51 TFLOPS 134 TFLOPS
FP32 (Single-Precision) 67 TFLOPS 51 TFLOPS 134 TFLOPS
TF32 Tensor Core 989 TFLOPS* 756 TFLOPS* 1,979 TFLOPS*
BFLOAT16 Tensor Core 1,979 TFLOPS* 1,513 TFLOPS* 3,958 TFLOPS*
FP16 Tensor Core 1,979 TFLOPS* 1,513 TFLOPS* 3,958 TFLOPS*
FP8 Tensor Core 3,958 TFLOPS* 3,026 TFLOPS* 7,916 TFLOPS*
INT8 Tensor Core 3,958 TOPS* 3,026 TOPS* 7,916 TOPS*

*Values marked with an asterisk indicate performance achieved with sparsity. So to get…

Replies: 4 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by rasbt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #482 on January 17, 2025 14:56.