server : export max observed n_past value #15361

okuvshynov · 2025-08-16T14:45:58Z

Add tracking for high watermark cache usage and make it available in /metrics endpoint.

Use-case: Tracking largest needed cache usage under realistic workload to better understand memory requirements and be able to adjust cache size/quantization for model/cache accordingly.

Make sure to read the contributing guidelines before submitting a PR

Add tracking for high watermark cache usage and make it available in /metrics endpoint. Use-case: Tracking largest needed cache usage under realistic workload to better understand memory requirements and be able to adjust cache size/quantization for model/cache accordingly.

ggml-org#15361 added new metric exported, but I've missed this doc.

#15361 added new metric exported, but I've missed this doc.

github-actions bot added examples server labels Aug 16, 2025

Merge branch 'ggml-org:master' into n_past_max

caf9dc7

okuvshynov marked this pull request as ready for review August 17, 2025 12:44

okuvshynov requested a review from ngxson as a code owner August 17, 2025 12:44

Merge branch 'ggml-org:master' into n_past_max

abd0737

ngxson approved these changes Aug 17, 2025

View reviewed changes

ngxson merged commit e5155e6 into ggml-org:master Aug 17, 2025
47 checks passed

okuvshynov deleted the n_past_max branch August 17, 2025 23:18

okuvshynov added a commit to okuvshynov/llama.cpp that referenced this pull request Oct 5, 2025

server: update readme to mention n_past_max metric

3d90f99

ggml-org#15361 added new metric exported, but I've missed this doc.

okuvshynov mentioned this pull request Oct 5, 2025

server: update readme to mention n_past_max metric #16436

Merged

ggerganov pushed a commit that referenced this pull request Oct 6, 2025

server: update readme to mention n_past_max metric (#16436)

c5fef0f

#15361 added new metric exported, but I've missed this doc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server : export max observed n_past value #15361

server : export max observed n_past value #15361

Uh oh!

okuvshynov commented Aug 16, 2025

Uh oh!

Uh oh!

Uh oh!

server : export max observed n_past value #15361

server : export max observed n_past value #15361

Uh oh!

Conversation

okuvshynov commented Aug 16, 2025

Uh oh!

Uh oh!

Uh oh!