[v0][Core] Use xgrammar shared context to avoid copy overhead for offline engine #13837

sethkimmel3 · 2025-02-25T18:08:06Z

The deepcopy introduced in #11637 adds a lot of overhead when adding a large number of requests to an llm_engine. This adds a more efficient method of copying the XGrammarLogitsProcessor data structure to remove that overhead.

cc: @mgoin @aarnphm

github-actions · 2025-02-25T18:08:18Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

aarnphm · 2025-02-25T18:19:01Z

vllm/model_executor/guided_decoding/xgrammar_decoding.py

should it be

Suggested change

if hasattr(self, 'token_bitmask') and self.token_bitmask is not None:

new_processor.token_bitmask = xgr.allocate_token_bitmask(

self.batch_size, self.config.vocab_size)

if hasattr(self, 'token_bitmask') and self.token_bitmask is not None:

new_processor.token_bitmask = self.token_bitmask

aarnphm

one tiny comment, if it passes the tests then LGTM.

aarnphm · 2025-02-25T18:21:08Z

@sethkimmel3 there are a few pre-commit problem can you fix this? thanks.

Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>

aarnphm · 2025-02-25T18:47:37Z

I cant update the title, but can you make it to [v0][Core] Use shared context to avoid copy overhead for offline engine

otherwise I think this should be ready to bring out of draft

sethkimmel3 · 2025-02-25T18:49:32Z

Done and done @aarnphm!

aarnphm · 2025-02-25T19:25:12Z

Thanks. Once all PR pass we can merge this

…line engine (vllm-project#13837) Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>

…line engine (vllm-project#13837) Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

…line engine (vllm-project#13837) Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>

mergify bot added the structured-output label Feb 25, 2025

aarnphm reviewed Feb 25, 2025

View reviewed changes

aarnphm approved these changes Feb 25, 2025

View reviewed changes

sethkimmel3 added 5 commits February 25, 2025 10:43

clone test

4f8265e

Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>

replace deepcopy

fbe5acf

Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>

ruff and small tweak

bf10cbc

Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>

update

2c1a699

Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>

lint

11b4114

Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>

sethkimmel3 force-pushed the clone-test branch from a19541b to 11b4114 Compare February 25, 2025 18:43

sethkimmel3 changed the title ~~Replace xgrammar deepcopy~~ [v0][Core] Use shared context to avoid copy overhead for offline engine Feb 25, 2025

sethkimmel3 marked this pull request as ready for review February 25, 2025 18:49

sethkimmel3 requested a review from mgoin as a code owner February 25, 2025 18:49

mgoin changed the title ~~[v0][Core] Use shared context to avoid copy overhead for offline engine~~ [v0][Core] Use xgrammar shared context to avoid copy overhead for offline engine Feb 25, 2025

mgoin approved these changes Feb 25, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 25, 2025

DarkLight1337 merged commit e206b54 into vllm-project:main Feb 26, 2025
56 of 58 checks passed

Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Mar 3, 2025

[v0][Core] Use xgrammar shared context to avoid copy overhead for off…

77ca08e

…line engine (vllm-project#13837) Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[v0][Core] Use xgrammar shared context to avoid copy overhead for off…

f4c2054

…line engine (vllm-project#13837) Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[v0][Core] Use xgrammar shared context to avoid copy overhead for offline engine #13837

[v0][Core] Use xgrammar shared context to avoid copy overhead for offline engine #13837

sethkimmel3 commented Feb 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 25, 2025

Uh oh!

aarnphm Feb 25, 2025

Uh oh!

aarnphm left a comment

Uh oh!

aarnphm commented Feb 25, 2025

Uh oh!

aarnphm commented Feb 25, 2025

Uh oh!

sethkimmel3 commented Feb 25, 2025

Uh oh!

aarnphm commented Feb 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-        if hasattr(self, 'token_bitmask') and self.token_bitmask is not None:
-            new_processor.token_bitmask = xgr.allocate_token_bitmask(
-                self.batch_size, self.config.vocab_size)
+        if hasattr(self, 'token_bitmask') and self.token_bitmask is not None:
+            new_processor.token_bitmask = self.token_bitmask

Uh oh!

[v0][Core] Use xgrammar shared context to avoid copy overhead for offline engine #13837

[v0][Core] Use xgrammar shared context to avoid copy overhead for offline engine #13837

Conversation

sethkimmel3 commented Feb 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 25, 2025

Uh oh!

aarnphm Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

aarnphm left a comment

Choose a reason for hiding this comment

Uh oh!

aarnphm commented Feb 25, 2025

Uh oh!

aarnphm commented Feb 25, 2025

Uh oh!

sethkimmel3 commented Feb 25, 2025

Uh oh!

aarnphm commented Feb 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sethkimmel3 commented Feb 25, 2025 •

edited by github-actions bot

Loading