-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[P/D] NIXL Updates #25844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[P/D] NIXL Updates #25844
Conversation
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: simon-mo <simon.mo@hey.com>
…25810) Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: simon-mo <simon.mo@hey.com>
…le video profiling (#25557) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: simon-mo <simon.mo@hey.com>
| remote_xfer_side_handle, | ||
| remote_block_descs_ids, | ||
| notif_msg=notif_id, | ||
| skip_desc_merge=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this debug cruft?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: simon-mo <simon.mo@hey.com>
| if req_id not in self.requests: | ||
| logger.warning( | ||
| "Got finished sending KV transfer for request %s," | ||
| "but the request is already freed.", req_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you unit test this scenario?
In #25067 the case I tested was an abort after the prefill request had finished, but @NickLucche rightly asked (AIUI):
If P is done with the request (finished, capped length), then how is the request being aborted in P ?
Whatever the scenario is ... if it is supposed to happen, we shouldn't have a warning that is not actionable by the user. But a "why this is supposed to happen" comment would be important for maintainability. Something similar to
vllm/vllm/v1/core/sched/scheduler.py
Lines 888 to 893 in 8616300
| request = self.requests.get(req_id) | |
| if request is None: | |
| # The request is already finished. This can happen if the | |
| # request is aborted while the model is executing it (e.g., | |
| # in pipeline parallelism). | |
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this should be a debug log.
I will post up a diff later today that:
- adds commentary about what scenarios this can be caused by
- adds a unit test to the scheduler tests we have for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on this in #25067 again
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
We have observed a rare scenario with AsyncLLM where a client disconnect triggers an abort request after the request has finished, but before AsyncLLM has processed the request output. See vllm-project#26012, vllm-project#25067, vllm-project#25844, and llm-d/llm-d#187. Without the fix, the unit test fails with: ``` logger.warning( "Releasing expired KV blocks for request %s which were " "retrieved by %d decode worker(s) within %d seconds.", req_id, count, envs.VLLM_NIXL_ABORT_REQUEST_TIMEOUT, ) > self._reqs_to_process.remove(req_id) E KeyError: '0' vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py:1238: KeyError ``` Signed-off-by: Mark McLoughlin <markmc@redhat.com>
This situation can occur when the API server receives a client
disconnect (and thus sends an abort) around the same time a prefill
completes and we keep the blocks (delay_free_blocks) around for a
remote decode. We should assume the blocks may be used, and so
we ignore the abort. If they are not used, they should be freed
by the connector after a timeout.
The original error was:
```
[scheduler.py:1183] Finished sending KV transfer for request cmpl-37c560d3-5680-4bd1-97f9-7ed31a56de60-0
File "/opt/vllm-source/vllm/v1/engine/core.py", line 292, in step
engine_core_outputs = self.scheduler.update_from_output(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/vllm-source/vllm/v1/core/sched/scheduler.py", line 893, in update_from_output
self._update_from_kv_xfer_finished(
File "/opt/vllm-source/vllm/v1/core/sched/scheduler.py", line 1184, in _update_from_kv_xfer_finish>
self._free_blocks(self.requests[req_id])
~~~~~~~~~~~~~^^^^^^^^
KeyError: 'cmpl-37c560d3-5680-4bd1-97f9-7ed31a56de60-0'
```
But since vllm-project#25844 we would log a warning. This fix makes it so
that situation in `_update_from_kv_xfer_finish()` should never
occur.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Robert Shaw <robshaw@redhat.com>
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.