Performance data mistakes in LLAMA inference #8455

oleotiger · 2024-02-19T06:04:27Z

oleotiger
Feb 19, 2024

Inference Performance of LLAMA-2 posted by Nvidia 2(https://docs.nvidia.com/nemo-framework/user-guide/latest/performance/llama.html)

According to the link above, the inference lantecy of LLAMA-2-13B with A100 80GB SXM4 at batch size=1 and tp=1, is less than latency of LLAMA-2-7B under the same condition.

How can we get this performance data? It’s unbelievable and ridiculous.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance data mistakes in LLAMA inference #8455

{{title}}

Replies: 0 comments

Select a reply

Performance data mistakes in LLAMA inference #8455

oleotiger Feb 19, 2024

Replies: 0 comments

oleotiger
Feb 19, 2024