-
-
Couldn't load subscription status.
- Fork 10.9k
[Log] Optimize kv cache memory log from Bytes to GiB #25204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Log] Optimize kv cache memory log from Bytes to GiB #25204
Conversation
Signed-off-by: yewentao256 <zhyanwentao@126.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to improve the readability of a log message by converting memory values from bytes to GiB. While the intention is good, the change introduces an issue where the suggested --kv-cache-memory command-line argument values are no longer valid for copy-pasting, as they are formatted in GiB instead of bytes. My review includes a suggestion to correct this by providing the byte value for the command and the GiB value in parentheses for readability, thus preserving the utility of the log message for users.
| f"into requested memory, or `--kv-cache-memory=" | ||
| f"{kv_cache_memory_bytes_to_gpu_limit}` " | ||
| f"({GiB(kv_cache_memory_bytes_to_gpu_limit)} GiB) to fully " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think using GiB is valid for the cli argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgoin Thanks! I have updated the code, now it would be something like --kv-cache-memory=8511484 (xxGiB)
|
Can we make this a debug log or something to actually fix how often we see it now? |
Don't fully understand " actually fix how often we see it now", this is an default info log and will be printed each time the server launches, do you mean make this a debug info? |
|
I see this log pretty much every time I launch the server so I think something is wrong. I don't think it should be common to see this. |
Signed-off-by: yewentao256 <zhyanwentao@126.com>
@mgoin Make sense to me, just converted to debug |
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Purpose
Actual usage is 95.97 GiB for weight, 5.42 GiB for peak activation, 2.75 GiB for non-torch memory, and 1.31 GiB for CUDAGraph memory. Replace gpu_memory_utilization config with--kv-cache-memory=58969782886to fit into requested memory, or--kv-cache-memory=77365384192to fully utilize gpu memory. Current kv cache memory in use is 60536355430 bytes.Changing to:
Actual usage is 95.97 GiB for weight, 5.68 GiB for peak activation, 2.75 GiB for non-torch memory, and 0.0 GiB for CUDAGraph memory. Replace gpu_memory_utilization config with--kv-cache-memory=58969782886(54.9 GiB) to fit into requested memory, or--kv-cache-memory=77365384192(73.1 GiB) to fully utilize gpu memory. Current kv cache memory in use is 38.28 GiB.