Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Bug in Masked LM Transformations #396

Merged
merged 2 commits into from
Dec 22, 2020
Merged

Conversation

jinyongyoo
Copy link
Collaborator

@jinyongyoo jinyongyoo commented Dec 20, 2020

What does this PR do?

Summary

In WordSwapMaskedLM, WordInsertionMaskedLM, WordMergeMasedLM, we use torch.argsort to pick the tokens that have the highest probability. However, torch.argsort sorts in ascending order by default, so we need to pass descending=True to reverse sort it. Also, it fixes a bug where we replace the words with tokens that have not been sanitized (i.e. you may see "##" appear).

Changes

  • Pass descending=True to torch.argsort in WordSwapMaskedLM, WordInsertionMaskedLM, WordMergeMasedLM.
  • Sanitize top tokens before adding them to candidate set.

@jinyongyoo jinyongyoo changed the title change to descending argsort Fix Bug in Masked LM Transformations Dec 20, 2020
@jinyongyoo jinyongyoo merged commit 3a27cb0 into master Dec 22, 2020
@qiyanjun qiyanjun deleted the fix-mlm-transformations branch July 28, 2021 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants