-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Bugfix] Fix several issues with p2p xPyD in GET type #23993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ensor Signed-off-by: Csrayz <jover@cmbchina.com>
Signed-off-by: Csrayz <jover@cmbchina.com>
buffer_size_threshold This can avoid clearing the buffer due to the tensor being too large. Signed-off-by: Csrayz <jover@cmbchina.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses a crash caused by an empty send_store during tensor eviction in GET mode. The core of the fix is the introduction of a pre-check that rejects tensors larger than the buffer threshold, which correctly prevents the problematic eviction loop. The addition of an assert to ensure the store is not empty during eviction is a good defensive measure. Furthermore, the code is cleaner due to the correction of several typos in variable names. The changes are logical and correctly resolve the bug.
|
@Abatom Please review. Additionally, I have a question. While we understand the GET type is not recommended due to performance concerns, I've observed that when using the GET type, TensorMemoryPool is not utilized as a secondary cache for GPU buffers. Furthermore, in this configuration, TensorMemoryPool is still instantiated, consuming some memory. Could you please let me know if there are any plans to implement TensorMemoryPool as a secondary cache for GPU buffers in future releases? |
|
Thanks, I'll review this PR shortly. |
Currently, the memory pool size can be reduced via the mem_pool_size_gb configuration. Because the performance of the GET model is inferior to that of PUT_ASYNC, we haven’t prioritized letting P instances use the memory pool yet; we’ll add this capability when time allows. |
2b982d4 to
2fc9cde
Compare
Signed-off-by: ivyilike <pww123@cmbchina.com>
2fc9cde to
e1efc27
Compare
Signed-off-by: ivyilike <pww123@cmbchina.com>
|
any update? |
|
@Csrayz Have you ever run the GET mode locally yourself? Have you stress-tested it? Did you encounter any garbled text? During chunked prefill, was there any corruption? When pre-emption happened, did you see garbled output or crashes? If everything above looks good, just let me know and I’ll run the tests locally. |
|
|
@Csrayz I'll run this PR. |
vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_engine.py
Outdated
Show resolved
Hide resolved
vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_engine.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Csrayz <jover@cmbchina.com>
|
All work is complete, waiting for your review. @NickLucche |
NickLucche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I would just point having tests here would go a long way in ensuring functionality for these changes.
|
Yes, unit tests are needed to ensure that the program can correctly handle boundary cases. However, recent work has been quite busy, and I may need to supplement the relevant unit tests separately at a later time. |
…3993) Signed-off-by: Csrayz <jover@cmbchina.com> Signed-off-by: ivyilike <pww123@cmbchina.com> Co-authored-by: ivyilike <pww123@cmbchina.com>
…3993) Signed-off-by: Csrayz <jover@cmbchina.com> Signed-off-by: ivyilike <pww123@cmbchina.com> Co-authored-by: ivyilike <pww123@cmbchina.com> Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Csrayz <jover@cmbchina.com> Signed-off-by: ivyilike <pww123@cmbchina.com> Co-authored-by: ivyilike <pww123@cmbchina.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
…3993) Signed-off-by: Csrayz <jover@cmbchina.com> Signed-off-by: ivyilike <pww123@cmbchina.com> Co-authored-by: ivyilike <pww123@cmbchina.com> Signed-off-by: gaojc <1055866782@qq.com>
…3993) Signed-off-by: Csrayz <jover@cmbchina.com> Signed-off-by: ivyilike <pww123@cmbchina.com> Co-authored-by: ivyilike <pww123@cmbchina.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…3993) Signed-off-by: Csrayz <jover@cmbchina.com> Signed-off-by: ivyilike <pww123@cmbchina.com> Co-authored-by: ivyilike <pww123@cmbchina.com>
…3993) Signed-off-by: Csrayz <jover@cmbchina.com> Signed-off-by: ivyilike <pww123@cmbchina.com> Co-authored-by: ivyilike <pww123@cmbchina.com>
…3993) Signed-off-by: Csrayz <jover@cmbchina.com> Signed-off-by: ivyilike <pww123@cmbchina.com> Co-authored-by: ivyilike <pww123@cmbchina.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
Fix several issues with p2p xPyD in GET type:
next(iter(self.send_store))to raise StopIteration.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.