-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RepetitionPenaltyLogitsProcessor
and EncoderRepetitionPenaltyLogitsProcessor
contains incorrect and unclear docstrings
#25970
Comments
@larekrow yes, I agree, there are some inconsistencies and some lack of documentation. Issues 1 and 2 get partially resolved with your PR (#25971). On 2, there is clearly an example missing, but I'm reviewing examples ATM. Regarding 3: it's a shame that |
Thanks for the feedback @gante. I have attempted to clarify the reward cases in PR #26129 but as alluded to, I feel that the docstrings for Both the class name The docstring I wrote for this class became somewhat convoluted because of this complication. A class name that would be more accurate would be Any suggestions as to how we should move forward? |
@larekrow I agree with your sentiment, the original implementation should be better (and I, as a reviewer, should have paid more attention to the implications) 🤗 Our north star here at As such, documenting what's going on (like you did) is the best compromise solution 🤗 Thank you for iterating with me 💛 |
No worries, we all appreciate the important work Hugging Face is doing (and there is a lot of work to be done). It's really cool how this huge project is driven by both the staff and the community. Happy to be a part of it 🤗 I've updated the docstrings according to your remarks in #26129. Please take a look whenever you can! |
our amzing @gante is off for a few weeks, feel free to ping me once this is ready! 😉 |
@ArthurZucker yep this is ready! Please take a look when you can. |
PR merged! |
RepetitionPenaltyLogitsProcessor
says "tokens with higher scores are less likely to be selected". However, according to the paper which states that "this penalized sampling works by discounting the scores of previously generated tokens" and the code which lowers the score when penalizing tokens (e.g. by multiplying a negative score with a 1.2 penalty, 1.2 being a value the paper highlighted), the docstring should be corrected to say that "tokens with higher scores are more likely to be selected".transformers/src/transformers/generation/logits_process.py
Lines 314 to 317 in d8e13b3
EncoderRepetitionPenaltyLogitsProcessor
requires an additionalencoder_input_ids
arg which docstring says "the encoder_input_ids that should not be repeated within the decoder ids". However, according to the class docstring, Add hallucination filter in generate() #18354 (comment), and the code with increases the score of tokens found within the original input ids (e.g. by multiplying a negative score with a 1 / 2 = 0.5 penalty, wherehallucination_penalty = 2
and is a value the PR author used), these are the ids that should be repeated within the decoder ids.transformers/src/transformers/generation/logits_process.py
Lines 338 to 346 in d8e13b3
RepetitionPenaltyLogitsProcessor
andEncoderRepetitionPenaltyLogitsProcessor
require apenalty
input, which is enforced as a positive float. However, this input only works as expected whenpenalty > 1
. If0 < penalty < 1
is given, the "penalty" becomes a "reward". The docstring does not mention this in any way.RepetitionPenaltyLogitsProcessor
transformers/src/transformers/generation/logits_process.py
Lines 307 to 308 in d8e13b3
EncoderRepetitionPenaltyLogitsProcessor
transformers/src/transformers/generation/logits_process.py
Lines 335 to 336 in d8e13b3
@gante
Before delving deeper into the source code and other resources, I was truly confused by the contradicting messages. I hope this will be rectified for other users.
The text was updated successfully, but these errors were encountered: