H100 & TGI getting 25 tokens per second? #1952

daz-williams · 2024-05-25T00:17:31Z

daz-williams
May 25, 2024

According to this HF blog, you could get up to 1200 tokens per second:

https://huggingface.co/blog/optimum-nvidia

However, in my custom app with TGI, I'm seeing 25 tokens per second with a H100 on vast.ai .... while with the same app and using a 4090, I'm seeing 135 tokens per second?

This doesn't seem right? If you personally get more than that with a H100, please share your TGI config.

Thanks!

daz-williams · 2024-05-25T00:19:48Z

daz-williams
May 25, 2024
Author

@OlivierDehaene any ideas please?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

H100 & TGI getting 25 tokens per second? #1952

{{title}}

Replies: 1 comment

{{title}}

Select a reply

H100 & TGI getting 25 tokens per second? #1952

daz-williams May 25, 2024

Replies: 1 comment

daz-williams May 25, 2024 Author

daz-williams
May 25, 2024

daz-williams
May 25, 2024
Author