-
Couldn't load subscription status.
- Fork 524
[Bugfix][Worker] Clear NPU memory between test profiling #989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for this fixing! I think we can derectly move gc collection and peak mem status reset into |
I have assembled the 3 methods into |
|
@wangxiyuan The CI is passed, and this PR is may needed for #969. |
Why not just do it in https://github.com/vllm-project/vllm-ascend/pull/989/files#diff-efe99563375ba645b1c1befb237632c2c91f85b21d2ab2d1095eef82aecd3999L106 |
Because maybe there are some other scenarios just need |
|
please merge this pr and #969 into one |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
hope can merge, sleep mode v1 need this |
I will rebase soon. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
f56d491 to
c509396
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
| # Profile the memory usage of the model and get the maximum number of | ||
| # cache blocks that can be allocated with the remaining free memory. | ||
| NPUPlatform.empty_cache() | ||
| clear_npu_memory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between the two memory clear methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@momo609 clear_npu_memory() = gc.collect() + empty_cache() + reset_peak_memory_stats() (which will make the profile in dummy run more accurate).
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
CC: @wangxiyuan |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: shen-shanshan <467638484@qq.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #989 +/- ##
==========================================
- Coverage 27.39% 27.34% -0.05%
==========================================
Files 56 56
Lines 6191 6183 -8
==========================================
- Hits 1696 1691 -5
+ Misses 4495 4492 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
What this PR does / why we need it?
Clear NPU memory between test profiling.
If we don't clear the memory this way, the CI may get
OOMerror between UTs.This modification is adapted from
vllm/worker/worker.py, find more details at https://github.com/vllm-project/vllm/blob/main/vllm/worker/worker.py#L184-L186.Does this PR introduce any user-facing change?
How was this patch tested?