Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Added docstring example for EpsilonLogitsWarper #24783 #25378

Conversation

sanjeevk-os
Copy link
Contributor

What does this PR do?

See #24783
Added docstring example for EpsilonLogitsWarper

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?

Who can review?

@gante

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the comments in the lines below, see also this comment: let's try to build shorter code for the example :)

>>> set_seed(100)

>>> # We can see that the model generates `J. Trump` as the next token
>>> outputs = model.generate(inputs["input_ids"], max_new_tokens=4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd set do_sample=True here, for a direct comparison with the example below :)

You may also want to illustrate a case where this line does not generate the most common case (i.e. does NOT generate J. Trump Jr), where applying epsilon_cutoff would result in generating the common case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gante : Can you share some explanation for illustrating the second case you mentioned ? I am trying to understand: How does the case when ordinary multinomial sampling doesn't generate the most common case and applying epsilon_cutoff resulting in the generation of common case show that using epsilon_cutoff is better at sampling from multiple variety of tokens ? Thanks.

Copy link
Member

@gante gante Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epsilon_cutoff is a variation of top_p/top_k, sharing a common point: it drops some low-probability solutions according to its algorithm. As such, we should be able to show an example where applying it on a sequence that has a low probability token results in a modified sequence :)

But hold this thought, the example is a bit too long and we need to think about a more concise way to showcase this processor

src/transformers/generation/logits_process.py Outdated Show resolved Hide resolved
@sanjeevk-os sanjeevk-os force-pushed the feature/24783-add-example-epsilonlogitswarper branch from a6fb4e3 to 01efbb0 Compare August 14, 2023 11:42
@sanjeevk-os
Copy link
Contributor Author

Hi @gante, addressed your comments.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example has 40 lines at the moment, which is too long. Our docs should be concise, otherwise users won't bother reading them :)

Let's get a case where we can showcase the processor with 2 generate calls (one with and another without epsilon_cutoff). Note that set_seed needs to be called before each generate call, otherwise we can't show that epsilon_cutoff had an impact (i.e. otherwise the difference can be attributed to sampling, and not to epsilon_cutoff)

@sanjeevk-os sanjeevk-os force-pushed the feature/24783-add-example-epsilonlogitswarper branch from 01efbb0 to 28400c9 Compare August 21, 2023 13:15
@sanjeevk-os
Copy link
Contributor Author

The example has 40 lines at the moment, which is too long. Our docs should be concise, otherwise users won't bother reading them :)

Let's get a case where we can showcase the processor with 2 generate calls (one with and another without epsilon_cutoff). Note that set_seed needs to be called before each generate call, otherwise we can't show that epsilon_cutoff had an impact (i.e. otherwise the difference can be attributed to sampling, and not to epsilon_cutoff)

Hi @gante made the suggested changes.

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for iterating 👍

@gante gante requested a review from ArthurZucker August 22, 2023 15:44
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a very small nit! Thanks for working on this! 🤗

Comment on lines 543 to 545
>>> # The use of the `epsilon_cutoff` parameter (best performing values between 3e-4 and 9e-4 from the paper mentioned above) generates tokens
>>> # by sampling from a variety of tokens with probabilities greater than or equal to epsilon value. The disadvantage of this sampling is that
>>> # if there are many possible tokens to sample from, the epsilon value has to be very small for sampling to occur from all the possible tokens.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are a bit too long should be less than 119 characters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArthurZucker : made the suggested change

@sanjeevk-os sanjeevk-os force-pushed the feature/24783-add-example-epsilonlogitswarper branch from 28400c9 to 386801b Compare August 23, 2023 04:01
@gante gante merged commit 6add3b3 into huggingface:main Aug 23, 2023
@gante
Copy link
Member

gante commented Aug 23, 2023

@sanjeevk-os Thank you for your contribution 💛

@sanjeevk-os sanjeevk-os deleted the feature/24783-add-example-epsilonlogitswarper branch August 24, 2023 04:08
parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023
 (huggingface#25378)

* [DOCS] Added docstring example for EpsilonLogitsWarper huggingface#24783

* minor code changes based on review comments

* set seed for both generate calls, reduced the example length

* fixed line length under 120 chars
blbadger pushed a commit to blbadger/transformers that referenced this pull request Nov 8, 2023
 (huggingface#25378)

* [DOCS] Added docstring example for EpsilonLogitsWarper huggingface#24783

* minor code changes based on review comments

* set seed for both generate calls, reduced the example length

* fixed line length under 120 chars
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 18, 2023
 (huggingface#25378)

* [DOCS] Added docstring example for EpsilonLogitsWarper huggingface#24783

* minor code changes based on review comments

* set seed for both generate calls, reduced the example length

* fixed line length under 120 chars
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants