Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sampling : one sequence per sampling context #3601

Closed
wants to merge 1 commit into from
Closed

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Oct 12, 2023

ref #3543

Simplify llama_sampling_context to contain information for only one sequence

I'm working on tree-based speculative decoding and so far it seems to me this simplification would make things easier

Verified

This commit was signed with the committer’s verified signature.
ggerganov Georgi Gerganov
ggml-ci
@ggerganov ggerganov requested a review from KerfuffleV2 October 12, 2023 17:37
@KerfuffleV2
Copy link
Collaborator

I won't get a chance to review this in depth until tomorrow.

I think my original thinking was that it would require less changes/manual management to do it the way I did with sampling contexts being automatically created from default settings (including a default grammar if provided). You also could manually manage it if you wanted, by just manually inserting the sequence-specific context and setting grammar to NULL when initializing the main sampling context. (The default settings only got used if it looked for a sequence-specific state and didn't find one.)

I'm not sure that's a good enough justification for the complexity, and I don't have any problem with the general idea your proposing.

@FSSRepo said we should make the init function just take the sampling params instead of gpt_params. That part I definitely agree with. (If we want to move this stuff out of common and into llama.cpp that also would make it easier.)

Copy link
Collaborator

@KerfuffleV2 KerfuffleV2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too much to add to what I said before. I thought this change would require more work fixing the examples, but apparently not. They all seem to compile. I tried a few, still seems to work.

The only thing I'd say is that changing the init function to just take parsing params and not gpt_params should probably be done before this is merged.

@ggerganov
Copy link
Owner Author

Yes, I already have a few more changes to llama_sampling_context ready + very likely moving last_tokens and candidates inside it. Will probably open PRs later today

@ggerganov
Copy link
Owner Author

Superseded by #3624

@ggerganov ggerganov closed this Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants