Generate: add DeepMind's Speculative sampling in assisted_generation #27270

domgri · 2023-11-03T15:02:02Z

What does this PR do?

Implements #27186. Still a draft, work in progress.

Implementation inspired from original paper and these[1][2] existing implementations.

Next steps:

Solve raised Todos in code TODO for speculative decoding:
Verify implementation
Then adhere to possible changes and fix them

Possible changes of PR:

modifies implementation of assisted_generation with do_sample=True
will possibly affect this blog post
will possibly affect assisted_generation documentation

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
[+-] Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. -> Speculative decoding functionality #27186
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

amyeroberts · 2023-11-03T15:10:51Z

cc @gante

gante · 2023-11-07T16:12:48Z

Hey @domgri 👋

Thank you for opening the PR! Let me know when you'd like a review 💪

domgri · 2023-11-12T09:52:00Z

Sure, absolutely. Sorry for not responding sooner, got some unexpected workload, hope to comeback to finish implementation in a week🤞

domgri · 2023-11-16T11:32:47Z

Hey, so I gave a couple of tries to finish implementation, although with little to no success 😕.

A couple of takeaways that might be useful for anyone continuing or trying to work on this implementation:

Initial PR might be useful for overall vision, how this feature could be implemented (TODOs how potential places for modifications).
Sampling cases (1, 2) could be improved with something more sophisticated (and possibly already existing functionalities).
From second iteration, main model model_inputs.input_ids would not match up with candidate_input_ids (usually would be shorter and containing only several last tokens from candidate_input_ids. I suspect something with cache and/or **candidate_kwargs had an effect on that, though, could not figure out exactly how and what.

transformers/src/transformers/generation/utils.py

Line 4646 in 85fde09

model_inputs = self.prepare_inputs_for_generation(candidate_input_ids, **candidate_kwargs)
tmp_result = tmp_max / tmp_max_sum was returning array of nan rather instead of 0's. Possibly related to max_fn implementation that migh be faulty.

I will close this PR since I am out of capacity right now to continue working on it. Feel free to use this PR as an inspiration for actual implementation. Thanks for enthusiastic welcome @amyeroberts @gante, my apologies for not really delivering much value and hope to see someone else step up and contribute more meaningfully 😊.

gante · 2023-11-17T12:34:22Z

@domgri no worries! Thank you for giving it a shot 🤗

Add initial commit. Contains todos what to do next

dfc2612

domgri added 2 commits November 4, 2023 19:27

Add simple sampling in two places

b8be7ac

Run make fixup

f3e5dc3

zhaoyang-star mentioned this pull request Nov 16, 2023

Generate: Add assisted generation #22211

Merged

domgri closed this Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate: add DeepMind's Speculative sampling in assisted_generation #27270

Generate: add DeepMind's Speculative sampling in assisted_generation #27270

domgri commented Nov 3, 2023 •

edited

Loading

amyeroberts commented Nov 3, 2023

gante commented Nov 7, 2023

domgri commented Nov 12, 2023

domgri commented Nov 16, 2023

gante commented Nov 17, 2023

Generate: add DeepMind's Speculative sampling in assisted_generation #27270

Generate: add DeepMind's Speculative sampling in assisted_generation #27270

Conversation

domgri commented Nov 3, 2023 • edited Loading

What does this PR do?

Before submitting

amyeroberts commented Nov 3, 2023

gante commented Nov 7, 2023

domgri commented Nov 12, 2023

domgri commented Nov 16, 2023

gante commented Nov 17, 2023

domgri commented Nov 3, 2023 •

edited

Loading