Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StereoSet benchmark for GPT2 #8

Open
Lj1ang opened this issue Aug 24, 2024 · 2 comments
Open

StereoSet benchmark for GPT2 #8

Lj1ang opened this issue Aug 24, 2024 · 2 comments

Comments

@Lj1ang
Copy link

Lj1ang commented Aug 24, 2024

I noticed that GPT2Tokenizer is used when evaluating GPT2, which doesn't have a mask_token. Will this impact the evaluation result?
I think I should add a new one manually but I'm unsure which one I should add.

@jacqueline-he
Copy link
Member

Hi, thanks for your interest in our work! :)

That’s a great question! GPT-2 is trained auto-regressively and therefore cannot be evaluated in the same manner as a masked language model. Instead of evaluating as a fill-in-the-blank problem, it's recommended that you compute the probability of the sentence when the blank is filled with a stereotypical term, and then with an anti-stereotypical term, and score based on whichever is more likely.

I would defer to Section 6.2 in the original StereoSet paper for more details.

@Lj1ang
Copy link
Author

Lj1ang commented Aug 27, 2024

Thanks for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants