Skip to content

Conversation

okuvshynov
Copy link
Contributor

Add tracking for high watermark cache usage and make it available in /metrics endpoint.

Use-case: Tracking largest needed cache usage under realistic workload to better understand memory requirements and be able to adjust cache size/quantization for model/cache accordingly.

Make sure to read the contributing guidelines before submitting a PR

Add tracking for high watermark cache usage and make it available in /metrics endpoint.

Use-case: Tracking largest needed cache usage under realistic workload
to better understand memory requirements and be able to adjust
cache size/quantization for model/cache accordingly.
@okuvshynov okuvshynov marked this pull request as ready for review August 17, 2025 12:44
@okuvshynov okuvshynov requested a review from ngxson as a code owner August 17, 2025 12:44
@ngxson ngxson merged commit e5155e6 into ggml-org:master Aug 17, 2025
47 checks passed
@okuvshynov okuvshynov deleted the n_past_max branch August 17, 2025 23:18
okuvshynov added a commit to okuvshynov/llama.cpp that referenced this pull request Oct 5, 2025
ggml-org#15361 added new metric
exported, but I've missed this doc.
ggerganov pushed a commit that referenced this pull request Oct 6, 2025
#15361 added new metric
exported, but I've missed this doc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants