Rough runtime benchmarks across tasks and context-lengths on any hardware setups #76

girishbalaji · 2024-12-03T16:27:53Z

Are there any rough numbers folks have on any hardaware setup for the inference runtime of any standard models across tasks and various context lengths?

Using vLLM, for just 5 requests for the 131072 context length for NIAH_single_1, I'm currently seeing ~15 minutes on a single a100 for Llama 3.1 8b.

While I actively play with parallelism configs, I'm wondering if other folks have any rough order of magnitude runtime changes, they have seen across various tasks and context lengths on any hardware setups

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rough runtime benchmarks across tasks and context-lengths on any hardware setups #76

Rough runtime benchmarks across tasks and context-lengths on any hardware setups #76

girishbalaji commented Dec 3, 2024

Rough runtime benchmarks across tasks and context-lengths on any hardware setups #76

Rough runtime benchmarks across tasks and context-lengths on any hardware setups #76

Comments

girishbalaji commented Dec 3, 2024