You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to the link above, the inference lantecy of LLAMA-2-13B with A100 80GB SXM4 at batch size=1 and tp=1, is less than latency of LLAMA-2-7B under the same condition.
How can we get this performance data? It’s unbelievable and ridiculous.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Inference Performance of LLAMA-2 posted by Nvidia 2(https://docs.nvidia.com/nemo-framework/user-guide/latest/performance/llama.html)
According to the link above, the inference lantecy of LLAMA-2-13B with A100 80GB SXM4 at batch size=1 and tp=1, is less than latency of LLAMA-2-7B under the same condition.
How can we get this performance data? It’s unbelievable and ridiculous.
Beta Was this translation helpful? Give feedback.
All reactions