[Log] Optimize kv cache memory log from Bytes to GiB #25204

yewentao256 · 2025-09-18T20:29:14Z

Purpose

Actual usage is 95.97 GiB for weight, 5.42 GiB for peak activation, 2.75 GiB for non-torch memory, and 1.31 GiB for CUDAGraph memory. Replace gpu_memory_utilization config with --kv-cache-memory=58969782886to fit into requested memory, or--kv-cache-memory=77365384192 to fully utilize gpu memory. Current kv cache memory in use is 60536355430 bytes.

Changing to:

Actual usage is 95.97 GiB for weight, 5.68 GiB for peak activation, 2.75 GiB for non-torch memory, and 0.0 GiB for CUDAGraph memory. Replace gpu_memory_utilization config with --kv-cache-memory=58969782886(54.9 GiB) to fit into requested memory, or--kv-cache-memory=77365384192 (73.1 GiB) to fully utilize gpu memory. Current kv cache memory in use is 38.28 GiB.

Signed-off-by: yewentao256 <zhyanwentao@126.com>

gemini-code-assist

Code Review

This pull request aims to improve the readability of a log message by converting memory values from bytes to GiB. While the intention is good, the change introduces an issue where the suggested --kv-cache-memory command-line argument values are no longer valid for copy-pasting, as they are formatted in GiB instead of bytes. My review includes a suggestion to correct this by providing the byte value for the command and the GiB value in parentheses for readability, thus preserving the utility of the log message for users.

vllm/v1/worker/gpu_worker.py

Signed-off-by: yewentao256 <zhyanwentao@126.com>

mgoin · 2025-09-18T22:28:44Z

vllm/v1/worker/gpu_worker.py

+                f"into requested memory, or `--kv-cache-memory="
+                f"{kv_cache_memory_bytes_to_gpu_limit}` "
+                f"({GiB(kv_cache_memory_bytes_to_gpu_limit)} GiB) to fully "


I don't think using GiB is valid for the cli argument

@mgoin Thanks! I have updated the code, now it would be something like --kv-cache-memory=8511484 (xxGiB)

mgoin · 2025-09-19T21:07:13Z

Can we make this a debug log or something to actually fix how often we see it now?

yewentao256 · 2025-09-19T21:17:04Z

Can we make this a debug log or something to actually fix how often we see it now?

Don't fully understand " actually fix how often we see it now", this is an default info log and will be printed each time the server launches, do you mean make this a debug info?

mgoin · 2025-09-22T22:39:50Z

I see this log pretty much every time I launch the server so I think something is wrong. I don't think it should be common to see this.

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 · 2025-09-23T14:57:17Z

I see this log pretty much every time I launch the server so I think something is wrong. I don't think it should be common to see this.

@mgoin Make sense to me, just converted to debug

) Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com>

) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: gaojc <1055866782@qq.com>

) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

) Signed-off-by: yewentao256 <zhyanwentao@126.com>

) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

optimize from bytes to GiB

1d07a7f

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners September 18, 2025 20:29

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 18, 2025

mergify bot added the v1 label Sep 18, 2025

gemini-code-assist bot reviewed Sep 18, 2025

View reviewed changes

vllm/v1/worker/gpu_worker.py Outdated Show resolved Hide resolved

update

d8f2687

Signed-off-by: yewentao256 <zhyanwentao@126.com>

mgoin reviewed Sep 18, 2025

View reviewed changes

debug info

fa3c783

Signed-off-by: yewentao256 <zhyanwentao@126.com>

mgoin approved these changes Sep 23, 2025

View reviewed changes

mgoin merged commit 846197f into vllm-project:main Sep 23, 2025
40 checks passed

mgoin deleted the wye-optimize-kv-cache-memory-log branch September 23, 2025 16:44

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Log] Optimize kv cache memory log from Bytes to GiB (vllm-project#25204

c0f478c

) Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 added a commit that referenced this pull request Oct 3, 2025

[Log] Optimize kv cache memory log from Bytes to GiB (#25204)

6462fee

Signed-off-by: yewentao256 <zhyanwentao@126.com>

gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025

[Log] Optimize kv cache memory log from Bytes to GiB (vllm-project#25204

7a8c82a

) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: gaojc <1055866782@qq.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Log] Optimize kv cache memory log from Bytes to GiB (vllm-project#25204

4676667

) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[Log] Optimize kv cache memory log from Bytes to GiB (vllm-project#25204

845925b

) Signed-off-by: yewentao256 <zhyanwentao@126.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Log] Optimize kv cache memory log from Bytes to GiB (vllm-project#25204

ae14e31

) Signed-off-by: yewentao256 <zhyanwentao@126.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Log] Optimize kv cache memory log from Bytes to GiB (vllm-project#25204

7206bdc

) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Log] Optimize kv cache memory log from Bytes to GiB #25204

[Log] Optimize kv cache memory log from Bytes to GiB #25204

Uh oh!

yewentao256 commented Sep 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mgoin Sep 18, 2025

Uh oh!

yewentao256 Sep 19, 2025 •

edited

Loading

Uh oh!

mgoin commented Sep 19, 2025

Uh oh!

yewentao256 commented Sep 19, 2025

Uh oh!

mgoin commented Sep 22, 2025

Uh oh!

yewentao256 commented Sep 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

[Log] Optimize kv cache memory log from Bytes to GiB #25204

[Log] Optimize kv cache memory log from Bytes to GiB #25204

Uh oh!

Conversation

yewentao256 commented Sep 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mgoin Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

yewentao256 Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgoin commented Sep 19, 2025

Uh oh!

yewentao256 commented Sep 19, 2025

Uh oh!

mgoin commented Sep 22, 2025

Uh oh!

yewentao256 commented Sep 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yewentao256 commented Sep 18, 2025 •

edited by github-actions bot

Loading

yewentao256 Sep 19, 2025 •

edited

Loading