-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[V0][Fix] structured decoding compatibility with speculative decoding #13823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
southfreebird
wants to merge
10
commits into
vllm-project:main
from
southfreebird:feature/speculative-decoding-and-guided-output-fix
Closed
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
e581f6a
Fix xgrammar decoding for speculative decoding
southfreebird 5bc4267
Fix tests
southfreebird 1390ba6
Fix comments
southfreebird 7ae7d40
Reduce the number of max_rollback_tokens by 1
southfreebird 3c8e0d1
Fix rebase issues
southfreebird 40f291e
Pythonize draft tokens only in case of structured output
southfreebird 1ed9242
Add check to MQA scorer and turn off NGram proposals with structured …
southfreebird 14dcd48
Fix linter
southfreebird 525f055
Add support for guidance decoding backend
southfreebird 4e40c90
Rebase to main
southfreebird File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I received a single test failure when running this test file on your branch. Below is the full log. It seems to indicate a guidance failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems stochastic, but it works for me with seed=0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tends to be the case with guidance failures: since informed prompting is ideal for guidance to be successful, the model often follows the schema even without guidance being enforced. This is why thorough testing is necessary to catch errors where guidance may not take place. Guidance should guarantee adherence, so even one failure is alarming to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
The problem is that it can fail to stop generating output after "}", regardless of whether speculative decoding is used or not. This issue is fixed in main by specifying the seed, as I understand.
My point is that it doesn’t seem to be a problem with speculative decoding itself. However, it is unexpected behaviour for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not seem clear to me that this is what is happening. To my understanding, when the matcher reaches a terminal state it will restrict the next token to be the end-of-sequence token, terminating the completion. Why would it fail to stop generating output?
The output from my above error report does not look like this is what happened:
This looks like a guidance failure, where the matcher state became out-of-sync with the completion resulting in incorrect outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also checked the tests again, and it seems that guided output works correctly. The problem lies in the special tokens (e.g.,
\tin this case) that the model can place inside fields. It seems thatxgrammardoes not check them.I also found that a model without speculative decoding can generate them as well, so I suppose this is not an issue with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you mean. Does this imply that there is a bug with xgrammar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the generation of special symbols inside a JSON field (which can be loaded with strict=False) is a bug then yes