-
Couldn't load subscription status.
- Fork 523
[ModelRunner]Add profile execute duration observation #1013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Overall lgtm, just some suggestions:
|
|
@wangxiyuan @Yikun @ganyi1996ppo please take a look, tks |
Signed-off-by: depeng1994 <depengzhang@foxmail.com>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: depeng1994 <depengzhang@foxmail.com>
25ee163 to
9967032
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
vllm_ascend/envs.py
Outdated
| lambda: bool(int(os.getenv("COMPILE_CUSTOM_KERNELS", "1"))), | ||
| "VLLM_ENABLE_MC2": | ||
| lambda: bool(int(os.getenv("VLLM_ENABLE_MC2", '0'))), | ||
| "VLLM_MODEL_EXECUTE_TIME_OBSERVE": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "VLLM_MODEL_EXECUTE_TIME_OBSERVE": | |
| "VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Signed-off-by: depeng1994 <depengzhang@foxmail.com>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message should also update.
The PR is good enough, just some nits see comments inline.
You can choose to address them in a separate PR.
| * Use the non-blocking API `ProfileExecuteDuration().capture_async` to set observation points asynchronously when you need to observe the execution duration. | ||
| * Use the blocking API `ProfileExecuteDuration().pop_captured_sync` at an appropriate time to get and print the execution durations of all observed stages. | ||
|
|
||
| ## Example Output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc is good but we could provide a e2e guid to help devs understand. Such as:
We already add key stage of inference (including pre-processing, model forward, etc.), you can execute inference script:
VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE=1 python3 vllm-ascend/examples/offline_inference_npu.py
| for tag, duration in durations.items() | ||
| ] | ||
| captured_name = "Decode" if self.attn_state == AscendAttentionState.DecodeOnly else "Prefill" | ||
| print(f"Profile execute duration [{captured_name}]:", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print or log?
I would like to nominate Mengqing Cao (@MengqingCao https://github.com/MengqingCao) as a maintainer, starting with my +1. ## Reason Review Quality: She has completed [120+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao) since Feb. 2025, include [#review-3077842852](#2088 (review)), [comment-2990074116](#1032 (comment)), [comment-2921063723](#1013 (comment)) high quality review. Sustained and Quality Contributions: She has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions include The vLLM contributions and help vLLM Ascend release is the main reason I nominated her: - vLLM: Things worth mentioning that she completed [28+ PR contributions](https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+) in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support. She is one of the important co-authors of [vllm#8054](vllm-project/vllm#8054) and hardware plugin RFC, this makes vllm-ascend plugin possible. Community Involvement: She is also very active and involved in [60+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao). So I think she's a great addition to the vLLM Ascend Maintainer team. - ✅**Review Quality:** She has completed 120+ reviews since Feb. 2025. https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao, include #2088 (review), #1446 (comment), #1032 (comment), #1013 (comment) quality review. - ✅**Sustained Contributions:** 99+ PR merged in vllm-project/vllm-ascend https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged - ✅**Quality Contribution:** She is one of the important co-authors of vllm-project/vllm#8054 , this makes vllm-ascend plugin possible. Things worth mentioning that she complete 28+ PR contributions in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support: https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+. At 2025 Q2, She also lead the [[RFC]: E2E CI test for key features](#413) and [[RFC]: Unit test coverage improvement](#1298) to help vllm ascend improve the coverage. Her main contributions focus on the adaptation of parallel strategies and communicator, such as #1800, #1856. These contributions are sufficient to prove she has “Deep understanding of vLLM and vLLM Ascend codebases” - ✅**Community Involvement:** Involved in 63+ issue reviewer https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao She led the v0.10.1 release as release manager - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@78dba40 Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
I would like to nominate Mengqing Cao (@MengqingCao https://github.com/MengqingCao) as a maintainer, starting with my +1. ## Reason Review Quality: She has completed [120+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao) since Feb. 2025, include [#review-3077842852](vllm-project#2088 (review)), [comment-2990074116](vllm-project#1032 (comment)), [comment-2921063723](vllm-project#1013 (comment)) high quality review. Sustained and Quality Contributions: She has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions include The vLLM contributions and help vLLM Ascend release is the main reason I nominated her: - vLLM: Things worth mentioning that she completed [28+ PR contributions](https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+) in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support. She is one of the important co-authors of [vllm#8054](vllm-project/vllm#8054) and hardware plugin RFC, this makes vllm-ascend plugin possible. Community Involvement: She is also very active and involved in [60+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao). So I think she's a great addition to the vLLM Ascend Maintainer team. - ✅**Review Quality:** She has completed 120+ reviews since Feb. 2025. https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao, include vllm-project#2088 (review), vllm-project#1446 (comment), vllm-project#1032 (comment), vllm-project#1013 (comment) quality review. - ✅**Sustained Contributions:** 99+ PR merged in vllm-project/vllm-ascend https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged - ✅**Quality Contribution:** She is one of the important co-authors of vllm-project/vllm#8054 , this makes vllm-ascend plugin possible. Things worth mentioning that she complete 28+ PR contributions in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support: https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+. At 2025 Q2, She also lead the [[RFC]: E2E CI test for key features](vllm-project#413) and [[RFC]: Unit test coverage improvement](vllm-project#1298) to help vllm ascend improve the coverage. Her main contributions focus on the adaptation of parallel strategies and communicator, such as vllm-project#1800, vllm-project#1856. These contributions are sufficient to prove she has “Deep understanding of vLLM and vLLM Ascend codebases” - ✅**Community Involvement:** Involved in 63+ issue reviewer https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao She led the v0.10.1 release as release manager - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@78dba40 Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
I would like to nominate Mengqing Cao (@MengqingCao https://github.com/MengqingCao) as a maintainer, starting with my +1. ## Reason Review Quality: She has completed [120+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao) since Feb. 2025, include [#review-3077842852](vllm-project#2088 (review)), [comment-2990074116](vllm-project#1032 (comment)), [comment-2921063723](vllm-project#1013 (comment)) high quality review. Sustained and Quality Contributions: She has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions include The vLLM contributions and help vLLM Ascend release is the main reason I nominated her: - vLLM: Things worth mentioning that she completed [28+ PR contributions](https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+) in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support. She is one of the important co-authors of [vllm#8054](vllm-project/vllm#8054) and hardware plugin RFC, this makes vllm-ascend plugin possible. Community Involvement: She is also very active and involved in [60+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao). So I think she's a great addition to the vLLM Ascend Maintainer team. - ✅**Review Quality:** She has completed 120+ reviews since Feb. 2025. https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao, include vllm-project#2088 (review), vllm-project#1446 (comment), vllm-project#1032 (comment), vllm-project#1013 (comment) quality review. - ✅**Sustained Contributions:** 99+ PR merged in vllm-project/vllm-ascend https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged - ✅**Quality Contribution:** She is one of the important co-authors of vllm-project/vllm#8054 , this makes vllm-ascend plugin possible. Things worth mentioning that she complete 28+ PR contributions in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support: https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+. At 2025 Q2, She also lead the [[RFC]: E2E CI test for key features](vllm-project#413) and [[RFC]: Unit test coverage improvement](vllm-project#1298) to help vllm ascend improve the coverage. Her main contributions focus on the adaptation of parallel strategies and communicator, such as vllm-project#1800, vllm-project#1856. These contributions are sufficient to prove she has “Deep understanding of vLLM and vLLM Ascend codebases” - ✅**Community Involvement:** Involved in 63+ issue reviewer https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao She led the v0.10.1 release as release manager - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@78dba40 Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
) ### What this PR does / why we need it? We need to **observe the time consumed in each stage of inference (including pre-processing, model forward, etc.), without any performance loss**. Therefore, we use the event timestamp mechanism of the NPU to mark any stage during the execution of the NPU device (this marking operation is executed asynchronously, with no performance loss). Additionally, we provide a blocking synchronization API `pop_captured_sync` to be called at an appropriate time, to print the time consumed in all observed stages. **model_runner_v1.py file only changed 5 lines, all of which were `ProfileExecuteDuration()` calls, and nothing else was changed, while more changes were showed due to the alignment issue.** ### Does this PR introduce _any_ user-facing change? Use env `VLLM_MODEL_EXECUTE_TIME_OBSERVE `to enable this feature ### How was this patch tested? Tested in deepseek model,Print like this: ``` 5691:(IntegratedWorker pid=1502285) Profile execute duration [Decode]: [post process]:14.17ms [prepare input and forward]:9.57ms [forward]:4.14ms 5695:(IntegratedWorker pid=1502285) Profile execute duration [Decode]: [post process]:14.29ms [prepare input and forward]:10.19ms [forward]:4.14ms 5697:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.81ms [prepare input and forward]:10.29ms [forward]:3.99ms 5701:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.10ms [prepare input and forward]:10.62ms [forward]:4.33ms 5705:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.65ms [prepare input and forward]:9.58ms [forward]:4.20ms 5709:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.43ms [prepare input and forward]:9.88ms [forward]:4.20ms 5711:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.89ms [prepare input and forward]:10.49ms [forward]:4.19ms 5715:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.14ms [prepare input and forward]:11.21ms [forward]:4.18ms 5719:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.71ms [prepare input and forward]:10.15ms [forward]:4.42ms 5723:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.62ms [prepare input and forward]:10.31ms [forward]:4.25ms 5725:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.12ms [prepare input and forward]:10.33ms [forward]:4.24ms 5729:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.58ms [prepare input and forward]:10.85ms [forward]:4.32ms 5733:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.32ms [prepare input and forward]:9.79ms [forward]:4.28ms 5737:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:15.06ms [prepare input and forward]:9.89ms [forward]:4.32ms 5739:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.62ms [prepare input and forward]:10.48ms [forward]:4.27ms 5743:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.60ms [prepare input and forward]:10.71ms [forward]:4.61ms 5747:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.21ms [prepare input and forward]:10.10ms [forward]:4.52ms 5751:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:15.03ms [prepare input and forward]:10.00ms [forward]:4.42ms ``` --------- Signed-off-by: depeng1994 <depengzhang@foxmail.com>
) ### What this PR does / why we need it? We need to **observe the time consumed in each stage of inference (including pre-processing, model forward, etc.), without any performance loss**. Therefore, we use the event timestamp mechanism of the NPU to mark any stage during the execution of the NPU device (this marking operation is executed asynchronously, with no performance loss). Additionally, we provide a blocking synchronization API `pop_captured_sync` to be called at an appropriate time, to print the time consumed in all observed stages. **model_runner_v1.py file only changed 5 lines, all of which were `ProfileExecuteDuration()` calls, and nothing else was changed, while more changes were showed due to the alignment issue.** ### Does this PR introduce _any_ user-facing change? Use env `VLLM_MODEL_EXECUTE_TIME_OBSERVE `to enable this feature ### How was this patch tested? Tested in deepseek model,Print like this: ``` 5691:(IntegratedWorker pid=1502285) Profile execute duration [Decode]: [post process]:14.17ms [prepare input and forward]:9.57ms [forward]:4.14ms 5695:(IntegratedWorker pid=1502285) Profile execute duration [Decode]: [post process]:14.29ms [prepare input and forward]:10.19ms [forward]:4.14ms 5697:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.81ms [prepare input and forward]:10.29ms [forward]:3.99ms 5701:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.10ms [prepare input and forward]:10.62ms [forward]:4.33ms 5705:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.65ms [prepare input and forward]:9.58ms [forward]:4.20ms 5709:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.43ms [prepare input and forward]:9.88ms [forward]:4.20ms 5711:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.89ms [prepare input and forward]:10.49ms [forward]:4.19ms 5715:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.14ms [prepare input and forward]:11.21ms [forward]:4.18ms 5719:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.71ms [prepare input and forward]:10.15ms [forward]:4.42ms 5723:(IntegratedWorker pid=1502401) Profile execute duration [Decode]: [post process]:14.62ms [prepare input and forward]:10.31ms [forward]:4.25ms 5725:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.12ms [prepare input and forward]:10.33ms [forward]:4.24ms 5729:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.58ms [prepare input and forward]:10.85ms [forward]:4.32ms 5733:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:14.32ms [prepare input and forward]:9.79ms [forward]:4.28ms 5737:(IntegratedWorker pid=1502462) Profile execute duration [Decode]: [post process]:15.06ms [prepare input and forward]:9.89ms [forward]:4.32ms 5739:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.62ms [prepare input and forward]:10.48ms [forward]:4.27ms 5743:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.60ms [prepare input and forward]:10.71ms [forward]:4.61ms 5747:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.21ms [prepare input and forward]:10.10ms [forward]:4.52ms 5751:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:15.03ms [prepare input and forward]:10.00ms [forward]:4.42ms ``` --------- Signed-off-by: depeng1994 <depengzhang@foxmail.com>
I would like to nominate Mengqing Cao (@MengqingCao https://github.com/MengqingCao) as a maintainer, starting with my +1. ## Reason Review Quality: She has completed [120+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao) since Feb. 2025, include [#review-3077842852](vllm-project#2088 (review)), [comment-2990074116](vllm-project#1032 (comment)), [comment-2921063723](vllm-project#1013 (comment)) high quality review. Sustained and Quality Contributions: She has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions include The vLLM contributions and help vLLM Ascend release is the main reason I nominated her: - vLLM: Things worth mentioning that she completed [28+ PR contributions](https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+) in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support. She is one of the important co-authors of [vllm#8054](vllm-project/vllm#8054) and hardware plugin RFC, this makes vllm-ascend plugin possible. Community Involvement: She is also very active and involved in [60+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao). So I think she's a great addition to the vLLM Ascend Maintainer team. - ✅**Review Quality:** She has completed 120+ reviews since Feb. 2025. https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao, include vllm-project#2088 (review), vllm-project#1446 (comment), vllm-project#1032 (comment), vllm-project#1013 (comment) quality review. - ✅**Sustained Contributions:** 99+ PR merged in vllm-project/vllm-ascend https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged - ✅**Quality Contribution:** She is one of the important co-authors of vllm-project/vllm#8054 , this makes vllm-ascend plugin possible. Things worth mentioning that she complete 28+ PR contributions in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support: https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+. At 2025 Q2, She also lead the [[RFC]: E2E CI test for key features](vllm-project#413) and [[RFC]: Unit test coverage improvement](vllm-project#1298) to help vllm ascend improve the coverage. Her main contributions focus on the adaptation of parallel strategies and communicator, such as vllm-project#1800, vllm-project#1856. These contributions are sufficient to prove she has “Deep understanding of vLLM and vLLM Ascend codebases” - ✅**Community Involvement:** Involved in 63+ issue reviewer https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao She led the v0.10.1 release as release manager - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@78dba40 Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
What this PR does / why we need it?
We need to observe the time consumed in each stage of inference (including pre-processing, model forward, etc.), without any performance loss.
Therefore, we use the event timestamp mechanism of the NPU to mark any stage during the execution of the NPU device (this marking operation is executed asynchronously, with no performance loss).
Additionally, we provide a blocking synchronization API
pop_captured_syncto be called at an appropriate time, to print the time consumed in all observed stages.model_runner_v1.py file only changed 5 lines, all of which were
ProfileExecuteDuration()calls, and nothing else was changed, while more changes were showed due to the alignment issue.Does this PR introduce any user-facing change?
Use env
VLLM_MODEL_EXECUTE_TIME_OBSERVEto enable this featureHow was this patch tested?
Tested in deepseek model,Print like this: