Skip to content

Conversation

@njhill
Copy link
Member

@njhill njhill commented Mar 17, 2025

Follow-on streamlining of V1 parallel sampling implementation. No need to repeat input processing for each sub-request.

Includes removing the unnecessary/unused request_id arg from tokenizer encode methods.

cc @markmc @afeldman-nm

Follow-on streamlining of V1 parallel sampling implementation.

Includes removing the unnecessary/unused request_id arg from tokenizer encode methods.

Signed-off-by: Nick Hill <nhill@redhat.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

parent_req = ParentRequest(request_id, params)
for idx in range(n):
request_id, params = parent_req.get_child_info(idx)
child_req = request if idx == n - 1 else copy(request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we okay with shallow copy here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes since the EngineCoreRequest is just immediately serialized. Even in the in-proc case, none of the contents are mutated.

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 18, 2025
@mergify
Copy link

mergify bot commented Mar 19, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 19, 2025
Copy link
Contributor

@afeldman-nm afeldman-nm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it will be great to eliminate input processing redundancy. Mostly just nits in terms of feedback.

# 3) Make a new RequestState and queue.
self.output_processor.add_request(child_req, parent_req, idx)
# 3) Add the request to EngineCore.
self.engine_core.add_request(child_req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it would seem that we are not enumerating processing steps in the prior comments, so "3)" can be eliminated.

parent_req = ParentRequest(request_id, params)
for idx in range(n):
request_id, params = parent_req.get_child_info(idx)
child_req = request if idx == n - 1 else copy(request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why not

Suggested change
child_req = request if idx == n - 1 else copy(request)
child_req = request if idx == 0 else copy(request)

? Unless there is a specifically reason to pass the original request data structure to child n-1, why not simplify by passing it to child 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just defensive again, we are dispatching each copy of the request so potentially better to not copy the ones that have already been sent.

if n == 1:
await self._add_request(request, None, 0, queue)

else:
Copy link
Contributor

@afeldman-nm afeldman-nm Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if worthwhile, but perhaps you could rewrite this section to make the else redundant:

        if n == 1:
            await self._add_request(request, None, 0, queue)
            return queue
        
        # Fan out child requests (for n>1).
        parent_req = ParentRequest(request_id, params)
        for idx in range(n):
            request_id, params = parent_req.get_child_info(idx)
            child_req = request if idx == n - 1 else copy(request)
            child_req.request_id = request_id
            child_req.sampling_params = params
            await self._add_request(child_req, parent_req, idx, queue)
                
        return queue

I like this because it expresses that there is a "quick exit" when n==1 and a longer process otherwise.

Comment on lines +182 to +187
# Process raw inputs into the request.
request = self.processor.process_inputs(request_id, prompt, params,
arrival_time, lora_request,
trace_headers,
prompt_adapter_request,
priority)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: kwargs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved the call which was already there :)

Comment on lines +185 to +191
# Convert Input --> Request.
request = self.processor.process_inputs(request_id, prompt, params,
arrival_time, lora_request,
trace_headers,
prompt_adapter_request,
priority)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: kwargs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

parent_req = ParentRequest(request_id, params)
for idx in range(n):
request_id, params = parent_req.get_child_info(idx)
child_req = request if idx == n - 1 else copy(request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe child_request instead of child_req, if you prefer not to use abbreviations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I would typically abbreviate if it helped to avoid line wrapping.


else:
# Fan out child requests (for n>1).
parent_req = ParentRequest(request_id, params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe parent_request, if you prefer not to use abbreviations?


else:
# Fan out child requests (for n>1).
parent_req = ParentRequest(request_id, params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe parent_request, if you prefer not to use abbreviations?

parent_req = ParentRequest(request_id, params)
for idx in range(n):
request_id, params = parent_req.get_child_info(idx)
child_req = request if idx == n - 1 else copy(request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe child_request, if you prefer not to use abbreviations?

njhill added 2 commits March 20, 2025 12:32
…eproc

# Conflicts:
#	vllm/v1/engine/async_llm.py
Signed-off-by: Nick Hill <nhill@redhat.com>
@mergify mergify bot removed the needs-rebase label Mar 20, 2025
@WoosukKwon
Copy link
Collaborator

@njhill Please let me know if this PR is good to merge!

@njhill
Copy link
Member Author

njhill commented Mar 21, 2025

@WoosukKwon yes it's ready, thanks! I don't know why the CI tests have started randomly OOMing.

@WoosukKwon WoosukKwon merged commit da6ea29 into vllm-project:main Mar 21, 2025
39 of 41 checks passed
@njhill njhill deleted the dedup-preproc branch March 21, 2025 14:29
erictang000 pushed a commit to erictang000/vllm that referenced this pull request Mar 25, 2025
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants