sampling : one sequence per sampling context #3601

ggerganov · 2023-10-12T17:37:41Z

Simplify llama_sampling_context to contain information for only one sequence

I'm working on tree-based speculative decoding and so far it seems to me this simplification would make things easier

ggml-ci

KerfuffleV2 · 2023-10-12T19:03:25Z

I won't get a chance to review this in depth until tomorrow.

I think my original thinking was that it would require less changes/manual management to do it the way I did with sampling contexts being automatically created from default settings (including a default grammar if provided). You also could manually manage it if you wanted, by just manually inserting the sequence-specific context and setting grammar to NULL when initializing the main sampling context. (The default settings only got used if it looked for a sequence-specific state and didn't find one.)

I'm not sure that's a good enough justification for the complexity, and I don't have any problem with the general idea your proposing.

@FSSRepo said we should make the init function just take the sampling params instead of gpt_params. That part I definitely agree with. (If we want to move this stuff out of common and into llama.cpp that also would make it easier.)

KerfuffleV2

Not too much to add to what I said before. I thought this change would require more work fixing the examples, but apparently not. They all seem to compile. I tried a few, still seems to work.

The only thing I'd say is that changing the init function to just take parsing params and not gpt_params should probably be done before this is merged.

ggerganov · 2023-10-13T06:03:39Z

Yes, I already have a few more changes to llama_sampling_context ready + very likely moving last_tokens and candidates inside it. Will probably open PRs later today

ggerganov · 2023-10-16T09:46:39Z

Superseded by #3624

sampling : one sequence per sampling context

Verified

This commit was signed with the committer’s verified signature.

ggerganov Georgi Gerganov

GPG key ID: 449E073F9DC10735

Learn about vigilant mode

5261aee

ggml-ci

ggerganov requested a review from KerfuffleV2 October 12, 2023 17:37

KerfuffleV2 mentioned this pull request Oct 12, 2023

server : parallel decoding and multimodal #3589

Closed

9 tasks

KerfuffleV2 approved these changes Oct 13, 2023

View reviewed changes

ggerganov closed this Oct 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sampling : one sequence per sampling context #3601

sampling : one sequence per sampling context #3601

ggerganov commented Oct 12, 2023 •

edited

Loading

KerfuffleV2 commented Oct 12, 2023

KerfuffleV2 left a comment

ggerganov commented Oct 13, 2023

ggerganov commented Oct 16, 2023

sampling : one sequence per sampling context #3601

sampling : one sequence per sampling context #3601

Conversation

ggerganov commented Oct 12, 2023 • edited Loading

KerfuffleV2 commented Oct 12, 2023

KerfuffleV2 left a comment

Choose a reason for hiding this comment

ggerganov commented Oct 13, 2023

ggerganov commented Oct 16, 2023

ggerganov commented Oct 12, 2023 •

edited

Loading