Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added the max_matching_ngram_size to GenerationConfig #29131

Merged
merged 7 commits into from
Mar 6, 2024
Merged

added the max_matching_ngram_size to GenerationConfig #29131

merged 7 commits into from
Mar 6, 2024

Conversation

mosheber
Copy link
Contributor

What does this PR do?

  • Added the max_matching_ngram_size parameter into the GenerationConfig, for the PromptLookupCandidateGenerator.
  • Included the max_matching_ngram_size when calling the init of PromptLookupCandidateGenerator in _get_candidate_generator, in case it is specified.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@gante , would appreciate it if you could give this PR a glance, and thank you in advance.

@gante
Copy link
Member

gante commented Feb 26, 2024

Hi @mosheber 👋

I'd be happy to merge the PR, conditional on the answer to the following question being yes (preferably backed with data): have you found significant benefits of changing the flag you added?

On the original issue, the author's experiments showed that there were little benefits in changing this option. As such, we don't want to add new flags unless they result in clear benefits :)

@danielkorat
Copy link
Contributor

danielkorat commented Feb 26, 2024

Hi @gante @mosheber 👋

The results below show a 3ms speedup in latency with a 7B target model, when comparing the default max_matching_ngram_size=2 with max_matching_ngram_size=4.
The bottom graph shows n_matches vs max_matching_ngram_size.
As the target size increases further, the speedup will be greater.
Note that these results use an optimized routine for PLD subsequence matching (~70x faster on avg, uses numba):

get_candidates:       0.3467 ms
get_candidates_opt:   0.0047 ms

Screenshot 2024-02-26 at 18 17 31
Screenshot 2024-02-26 at 18 17 11

@gante
Copy link
Member

gante commented Feb 26, 2024

@danielkorat convinced by your numbers 👍

Let's add this PR!

@@ -698,9 +698,12 @@ def _get_candidate_generator(
Returns the candidate generator to be used in `assisted_generation`
"""
if generation_config.prompt_lookup_num_tokens is not None:
candidate_generator = PromptLookupCandidateGenerator(
candidate_generator_params = dict(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a dict here, let's:
a) pass keyword arguments to PromptLookupCandidateGenerator (as before the PR)
b) default max_matching_ngram_size in PromptLookupCandidateGenerator to None, and set it to the original default value in __init__ if it is None.

This pushes complexity away from generate and into PromptLookupCandidateGenerator :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment! I switched back to keyword arguments.

@gante
Copy link
Member

gante commented Feb 26, 2024

@mosheber after applying the fix, please run make fixup from the transformers folder and commit the result. It will fix the CI issues you're seeing :)

@mosheber
Copy link
Contributor Author

@mosheber after applying the fix, please run make fixup from the transformers folder and commit the result. It will fix the CI issues you're seeing :)

I ran the make fixup as well.

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me, thank you for iterating and making transformers better 💛

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mosheber

Actually, there is a tiny missing thing: max_matching_ngram_size should have an entry in the docstring of GenerationConfig

our CI is still complaining about code formating, make sure you have the latest version installed when running make fixup again :)

@mosheber
Copy link
Contributor Author

@mosheber

Actually, there is a tiny missing thing: max_matching_ngram_size should have an entry in the docstring of GenerationConfig

our CI is still complaining about code formating, make sure you have the latest version installed when running make fixup again :)

Great idea! I added the doc string to the GenerationConfig class.
Regarding the tests, I fixed the formatting issue, it seems that the CI fails due to timing reasons, which are not related to this PR.

@gante gante requested a review from ArthurZucker February 28, 2024 12:25
@gante
Copy link
Member

gante commented Feb 28, 2024

Regarding the tests, I fixed the formatting issue, it seems that the CI fails due to timing reasons, which are not related to this PR.

Yeah don't worry about it :)

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! just one nit

src/transformers/generation/configuration_utils.py Outdated Show resolved Hide resolved
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@gante gante merged commit 19fb1e2 into huggingface:main Mar 6, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants