different TFLOPS with NVIDIA's document. #486

KK666-AI · 2025-01-14T16:00:32Z

KK666-AI
Jan 14, 2025

Bug description

The TFLOPS value shown in this code for H100 is different from NVIDIA's document as illustrated by following figure.

What operating system are you using?

None

Where do you run your code?

None

Environment

Answered by rasbt

Jan 17, 2025

I did some more research on this, and I believe these are the correct numbers:

Precision	H100 SXM (80 GB)	H100 PCIe (80 GB)	H100 NVL (188 GB)
FP64 (Double-Precision)	34 TFLOPS	26 TFLOPS	68 TFLOPS
FP64 Tensor Core	67 TFLOPS	51 TFLOPS	134 TFLOPS
FP32 (Single-Precision)	67 TFLOPS	51 TFLOPS	134 TFLOPS
TF32 Tensor Core	989 TFLOPS*	756 TFLOPS*	1,979 TFLOPS*
BFLOAT16 Tensor Core	1,979 TFLOPS*	1,513 TFLOPS*	3,958 TFLOPS*
FP16 Tensor Core	1,979 TFLOPS*	1,513 TFLOPS*	3,958 TFLOPS*
FP8 Tensor Core	3,958 TFLOPS*	3,026 TFLOPS*	7,916 TFLOPS*
INT8 Tensor Core	3,958 TOPS*	3,026 TOPS*	7,916 TOPS*

*Values marked with an asterisk indicate performance achieved with sparsity. So to get…

View full answer

rasbt · 2025-01-14T16:14:13Z

rasbt
Jan 14, 2025
Maintainer

Thanks for the callout. You are absolutely right that the numbers differ.

It's been a while since but according to the link I had included there, I gathered the number for the H100 PCI version from here https://www.techpowerup.com/gpu-specs/h100-pcie-80-gb.c3899, which has a lower number of TFLOPS than the H100 SXM and H100 NVL you referenced.

0 replies

KK666-AI · 2025-01-15T01:32:02Z

KK666-AI
Jan 15, 2025
Author

so which number is more correct? do i need to update if i want to calculate MFU?

0 replies

rasbt · 2025-01-16T01:05:26Z

rasbt
Jan 16, 2025
Maintainer

I am not a hardware expert and take this with a grain of salt, but I think nvidia-smi might give you some info what card you have in your machine. Whether it's the PCI, NVL, SXM versions. I don't have machines with these cards so I can't confirm.

But yeah, as you can see based on the table above even the SXM and NVL version have noticeable differences.

0 replies

rasbt · 2025-01-17T14:55:58Z

rasbt
Jan 17, 2025
Maintainer

I did some more research on this, and I believe these are the correct numbers:

Precision	H100 SXM (80 GB)	H100 PCIe (80 GB)	H100 NVL (188 GB)
FP64 (Double-Precision)	34 TFLOPS	26 TFLOPS	68 TFLOPS
FP64 Tensor Core	67 TFLOPS	51 TFLOPS	134 TFLOPS
FP32 (Single-Precision)	67 TFLOPS	51 TFLOPS	134 TFLOPS
TF32 Tensor Core	989 TFLOPS*	756 TFLOPS*	1,979 TFLOPS*
BFLOAT16 Tensor Core	1,979 TFLOPS*	1,513 TFLOPS*	3,958 TFLOPS*
FP16 Tensor Core	1,979 TFLOPS*	1,513 TFLOPS*	3,958 TFLOPS*
FP8 Tensor Core	3,958 TFLOPS*	3,026 TFLOPS*	7,916 TFLOPS*
INT8 Tensor Core	3,958 TOPS*	3,026 TOPS*	7,916 TOPS*

*Values marked with an asterisk indicate performance achieved with sparsity. So to get the correct numbers I suggest first determining what exact H100 card you are using.

To clean up the issue tracker, I'll be moving this to the Discussion page here on GitHub but please let me know if you have any follow-up questions and/or concerns. You did raise a good question there, but it's just a bit tricky to address.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different TFLOPS with NVIDIA's document. #486

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

different TFLOPS with NVIDIA's document. #486

KK666-AI Jan 14, 2025

Bug description

What operating system are you using?

Where do you run your code?

Environment

Replies: 4 comments

rasbt Jan 14, 2025 Maintainer

KK666-AI Jan 15, 2025 Author

rasbt Jan 16, 2025 Maintainer

rasbt Jan 17, 2025 Maintainer

KK666-AI
Jan 14, 2025

rasbt
Jan 14, 2025
Maintainer

KK666-AI
Jan 15, 2025
Author

rasbt
Jan 16, 2025
Maintainer

rasbt
Jan 17, 2025
Maintainer