Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RepetitionPenaltyLogitsProcessor and EncoderRepetitionPenaltyLogitsProcessor contains incorrect and unclear docstrings #25970

Closed
larekrow opened this issue Sep 5, 2023 · 7 comments

Comments

@larekrow
Copy link
Contributor

larekrow commented Sep 5, 2023

  1. The class docstring of RepetitionPenaltyLogitsProcessor says "tokens with higher scores are less likely to be selected". However, according to the paper which states that "this penalized sampling works by discounting the scores of previously generated tokens" and the code which lowers the score when penalizing tokens (e.g. by multiplying a negative score with a 1.2 penalty, 1.2 being a value the paper highlighted), the docstring should be corrected to say that "tokens with higher scores are more likely to be selected".

score = torch.gather(scores, 1, input_ids)
# if score < 0 then repetition penalty has to be multiplied to reduce the previous token probability
score = torch.where(score < 0, score * self.penalty, score / self.penalty)

  1. EncoderRepetitionPenaltyLogitsProcessor requires an additional encoder_input_ids arg which docstring says "the encoder_input_ids that should not be repeated within the decoder ids". However, according to the class docstring, Add hallucination filter in generate() #18354 (comment), and the code with increases the score of tokens found within the original input ids (e.g. by multiplying a negative score with a 1 / 2 = 0.5 penalty, where hallucination_penalty = 2 and is a value the PR author used), these are the ids that should be repeated within the decoder ids.

self.penalty = 1 / penalty
self.encoder_input_ids = encoder_input_ids
@add_start_docstrings(LOGITS_PROCESSOR_INPUTS_DOCSTRING)
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
score = torch.gather(scores, 1, self.encoder_input_ids)
# if score < 0 then repetition penalty has to be multiplied to reduce the previous token probability
score = torch.where(score < 0, score * self.penalty, score / self.penalty)

  1. Both RepetitionPenaltyLogitsProcessor and EncoderRepetitionPenaltyLogitsProcessor require a penalty input, which is enforced as a positive float. However, this input only works as expected when penalty > 1. If 0 < penalty < 1 is given, the "penalty" becomes a "reward". The docstring does not mention this in any way.

RepetitionPenaltyLogitsProcessor

if not isinstance(penalty, float) or not (penalty > 0):
raise ValueError(f"`penalty` has to be a strictly positive float, but is {penalty}")

EncoderRepetitionPenaltyLogitsProcessor

if not isinstance(penalty, float) or not (penalty > 0):
raise ValueError(f"`penalty` has to be a strictly positive float, but is {penalty}")

@gante

Before delving deeper into the source code and other resources, I was truly confused by the contradicting messages. I hope this will be rectified for other users.

@gante
Copy link
Member

gante commented Sep 12, 2023

@larekrow yes, I agree, there are some inconsistencies and some lack of documentation.

Issues 1 and 2 get partially resolved with your PR (#25971). On 2, there is clearly an example missing, but I'm reviewing examples ATM.

Regarding 3: it's a shame that penalty has the opposite meaning in on both processors, but changing it would be a breaking change. The best we can do is to clarify the docstring, including documenting the reward case! Would you like to open a PR to clarify this one? :)

@larekrow
Copy link
Contributor Author

Thanks for the feedback @gante. I have attempted to clarify the reward cases in PR #26129 but as alluded to, I feel that the docstrings for EncoderRepetitionPenaltyLogitsProcessor will require some adjustments.

Both the class name EncoderRepetitionPenaltyLogitsProcessor and the original description (which I have left untouched) are misleading, because the class is not actually penalizing the encoder ids but only rewarding it when an intended hallucination_penalty value of >1 is given. In fact, it does not penalize any ids in that case.

The docstring I wrote for this class became somewhat convoluted because of this complication. A class name that would be more accurate would be EncoderRepetitionRewardLogitsProcessor, but this would be a breaking change as you pointed out.

Any suggestions as to how we should move forward?

@gante
Copy link
Member

gante commented Sep 13, 2023

@larekrow I agree with your sentiment, the original implementation should be better (and I, as a reviewer, should have paid more attention to the implications) 🤗 Our north star here at transformers is to preserve backward compatibility, even if the original design is sub-optimal. We may lose in terms of clarity, but production users are reassured that we don't make sudden changes!

As such, documenting what's going on (like you did) is the best compromise solution 🤗 Thank you for iterating with me 💛

@larekrow
Copy link
Contributor Author

No worries, we all appreciate the important work Hugging Face is doing (and there is a lot of work to be done). It's really cool how this huge project is driven by both the staff and the community. Happy to be a part of it 🤗

I've updated the docstrings according to your remarks in #26129. Please take a look whenever you can!

@ArthurZucker
Copy link
Collaborator

our amzing @gante is off for a few weeks, feel free to ping me once this is ready! 😉

@larekrow
Copy link
Contributor Author

@ArthurZucker yep this is ready! Please take a look when you can.

@larekrow
Copy link
Contributor Author

PR merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants