Skip to content

Conversation

@yewentao256
Copy link
Member

@yewentao256 yewentao256 commented Sep 18, 2025

Purpose

Actual usage is 95.97 GiB for weight, 5.42 GiB for peak activation, 2.75 GiB for non-torch memory, and 1.31 GiB for CUDAGraph memory. Replace gpu_memory_utilization config with --kv-cache-memory=58969782886to fit into requested memory, or--kv-cache-memory=77365384192 to fully utilize gpu memory. Current kv cache memory in use is 60536355430 bytes.

Changing to:

Actual usage is 95.97 GiB for weight, 5.68 GiB for peak activation, 2.75 GiB for non-torch memory, and 0.0 GiB for CUDAGraph memory. Replace gpu_memory_utilization config with --kv-cache-memory=58969782886(54.9 GiB) to fit into requested memory, or--kv-cache-memory=77365384192 (73.1 GiB) to fully utilize gpu memory. Current kv cache memory in use is 38.28 GiB.

Signed-off-by: yewentao256 <zhyanwentao@126.com>
@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 18, 2025
@mergify mergify bot added the v1 label Sep 18, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to improve the readability of a log message by converting memory values from bytes to GiB. While the intention is good, the change introduces an issue where the suggested --kv-cache-memory command-line argument values are no longer valid for copy-pasting, as they are formatted in GiB instead of bytes. My review includes a suggestion to correct this by providing the byte value for the command and the GiB value in parentheses for readability, thus preserving the utility of the log message for users.

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Comment on lines +388 to +390
f"into requested memory, or `--kv-cache-memory="
f"{kv_cache_memory_bytes_to_gpu_limit}` "
f"({GiB(kv_cache_memory_bytes_to_gpu_limit)} GiB) to fully "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think using GiB is valid for the cli argument

Copy link
Member Author

@yewentao256 yewentao256 Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgoin Thanks! I have updated the code, now it would be something like --kv-cache-memory=8511484 (xxGiB)

@mgoin
Copy link
Member

mgoin commented Sep 19, 2025

Can we make this a debug log or something to actually fix how often we see it now?

@yewentao256
Copy link
Member Author

Can we make this a debug log or something to actually fix how often we see it now?

Don't fully understand " actually fix how often we see it now", this is an default info log and will be printed each time the server launches, do you mean make this a debug info?

@mgoin
Copy link
Member

mgoin commented Sep 22, 2025

I see this log pretty much every time I launch the server so I think something is wrong. I don't think it should be common to see this.

Signed-off-by: yewentao256 <zhyanwentao@126.com>
@yewentao256
Copy link
Member Author

I see this log pretty much every time I launch the server so I think something is wrong. I don't think it should be common to see this.

@mgoin Make sense to me, just converted to debug

@mgoin mgoin merged commit 846197f into vllm-project:main Sep 23, 2025
40 checks passed
@mgoin mgoin deleted the wye-optimize-kv-cache-memory-log branch September 23, 2025 16:44
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
yewentao256 added a commit that referenced this pull request Oct 3, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025
)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: gaojc <1055866782@qq.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants