-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
🐛 Describe the bug
After a load test where all requests are aborted, metrics/stats don't update back to zero:
DEBUG 08-07 15:35:23 [loggers.py:122] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 106 reqs, Waiting: 15 reqs, GPU KV cache usage: 90.4%, Prefix cache hit rate: 30.5%
DEBUG 08-07 15:35:33 [loggers.py:122] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 106 reqs, Waiting: 15 reqs, GPU KV cache usage: 90.4%, Prefix cache hit rate: 30.5%
DEBUG 08-07 15:35:43 [loggers.py:122] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 106 reqs, Waiting: 15 reqs, GPU KV cache usage: 90.4%, Prefix cache hit rate: 30.5%
Need to repro / confirm vLLM version.
Reported by @dagrayvid.
vMaroon, celidos and bbartels
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working