Skip to content

Conversation

@hasB4K
Copy link
Contributor

@hasB4K hasB4K commented Mar 6, 2025

Hello,

Adding a kv_connector_extra_config will help custom connectors to use it to hack around if they need.
I'm using it within the SimpleBuffer/PyNcclPipe to set the torch.distributed Store's timeout (in the example, I set the timeout for 1 hour):

--kv-transfer-config '{
   "kv_connector": "PyNcclConnector",
   "kv_role": "kv_consumer",
   "kv_rank": 1,
   "kv_parallel_size": 2,
   "kv_buffer_size": 5e9,
   "kv_ip": "localhost",
   "kv_port": 12345,
   "kv_connector_extra_config": {"store_timeout": 3600}
}'

Issue like this: #10502 (comment) torch.distributed.DistStoreError: wait timeout after 300000ms, keys: /send_to/0/4
Would happen less frequently, since it only happen when you have no requests.

I think it's not safe to increase it too much, but I have been able to increase this safely to more than 1 hour. It can be useful for dev purposes, or to reduce the need of a frequent heartbeat.

@github-actions
Copy link

github-actions bot commented Mar 6, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@hasB4K hasB4K force-pushed the mathis/vllm-public/disagg_timeout branch from 1d0c332 to c7d61de Compare March 7, 2025 09:38
Copy link
Collaborator

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small naming changes and other parts LGTM. Feel free to suggest other naming.

… KVTransferConfig.kv_connector_extra_config

Signed-off-by: Mathis Felardos <mathis@mistral.ai>
@hasB4K hasB4K force-pushed the mathis/vllm-public/disagg_timeout branch from c7d61de to cefd6d9 Compare March 7, 2025 17:56
@hasB4K hasB4K requested a review from KuntaiDu March 10, 2025 12:52
Copy link
Collaborator

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@KuntaiDu KuntaiDu enabled auto-merge (squash) March 11, 2025 20:25
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 11, 2025
@vllm-bot vllm-bot merged commit 1bd32bc into vllm-project:main Mar 13, 2025
47 of 49 checks passed
@hasB4K hasB4K deleted the mathis/vllm-public/disagg_timeout branch March 13, 2025 09:23
richardsliu pushed a commit to richardsliu/vllm that referenced this pull request Mar 14, 2025
… and add KVTransferConfig.kv_connector_extra_config (vllm-project#14367)

Signed-off-by: Mathis Felardos <mathis@mistral.ai>
Signed-off-by: Richard Liu <ricliu@google.com>
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
… and add KVTransferConfig.kv_connector_extra_config (vllm-project#14367)

Signed-off-by: Mathis Felardos <mathis@mistral.ai>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
… and add KVTransferConfig.kv_connector_extra_config (vllm-project#14367)

Signed-off-by: Mathis Felardos <mathis@mistral.ai>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
… and add KVTransferConfig.kv_connector_extra_config (vllm-project#14367)

Signed-off-by: Mathis Felardos <mathis@mistral.ai>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants