H100 & TGI getting 25 tokens per second? #1952
daz-williams
started this conversation in
General
Replies: 1 comment
-
@OlivierDehaene any ideas please? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
According to this HF blog, you could get up to 1200 tokens per second:
https://huggingface.co/blog/optimum-nvidia
However, in my custom app with TGI, I'm seeing 25 tokens per second with a H100 on vast.ai .... while with the same app and using a 4090, I'm seeing 135 tokens per second?
This doesn't seem right? If you personally get more than that with a H100, please share your TGI config.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions