Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use right range and threshold for showing "bad" words/sentences #370

Merged
merged 2 commits into from
Mar 3, 2022

Conversation

abhi-agg
Copy link
Contributor

@abhi-agg abhi-agg commented Mar 3, 2022

Higher QE scores means better quality.
Changed the threshold to -0.5 ln(0.5) => -0.6931 as per discussions in QE meetings.

@mfomicheva @abarbosa94 @felipesantosk Please let me know if any of the above is wrong 👍🏾

@abhi-agg abhi-agg requested a review from jelmervdl March 3, 2022 12:49
@jelmervdl
Copy link
Member

jelmervdl commented Mar 3, 2022

If these are the correct thresholds we can also lose the "these thresholds are just examples" comments

@abhi-agg
Copy link
Contributor Author

abhi-agg commented Mar 3, 2022

I just want to confirm with @mfomicheva @abarbosa94 @felipesantosk once more that the threshold of -0.5 is a good one as a starting point for all the language pairs irrespective of whether the quality scores are returned using translation models or supervised QE models under the hood.

I can remove the comment after their confirmation. Thanks for pointing out 👍🏾

@mfomicheva
Copy link

I just want to confirm with @mfomicheva @abarbosa94 @felipesantosk once more that the threshold of -0.5 is a good one as a starting point for all the language pairs irrespective of whether the quality scores are returned using translation models or supervised QE models under the hood.

I can remove the comment after their confirmation. Thanks for pointing out 👍🏾

I responded on slack

@abhi-agg
Copy link
Contributor Author

abhi-agg commented Mar 3, 2022

Just documenting what @mfomicheva shared:

For the supervised models that were fitted on annotated data (En-Es, En-Cs and En-Et language pairs), you should use the threshold that corresponds to the log of 0.5, which is around -0.6931 (here log means ln)

For the unsupervised case where the returned value is just the average log-prob coming directly from the MT model, I think you should still start with the same threshold and experiment further with it

The range [-0.6931, 0] means better quality

I will modify PR to reflect these changes.
Updating the description of the PR as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants