-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update wer to standard definition #63
Conversation
Reviewer's Guide by SourceryThis PR updates the Word Error Rate (WER) implementation to align with the standard definition by introducing an asymmetric normalization option. The changes modify the core WER calculation logic and add comprehensive test cases for both symmetric and asymmetric calculations. No diagrams generated as the changes look simple and do not need a visual representation. File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ChristianGeng - I've reviewed your changes - here's some feedback:
Overall Comments:
- The
_wer_jiwer()
function appears to be unused. Consider either removing it or adding tests that utilize it.
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
I has been removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the documentation part of the pull request. The actual implementation is better review by @ureichel. I will assign him as well.
The current update sets symmetric=False
as default, which will change the current behavior of audmetric.word_error_rate()
. But as this is in line with the default definition in Wikipedia, it seems to make sense to me.
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
thanks a lot for these updates! I'd have 2 comments: meaning of "mean edit distance" in docstringIn the docstring it says:
To my understanding I would not speak of "mean edit distance" for single sequence pairs, but rather of "normalized edit distance". This normalized edit distance can be seen as the mean editing cost per sequence item. Thus the mean is calculated for edit costs rather than for the edit distance, which is a single value only. "symmetric" argumentFurther it says:
Here I have to apologize: if I remember correctly the choice of this name is based on my suggestion, which on second thought was not well chosen. Actually, all non zero-substitution edit costs been equal, the WER is always symmetric, regardless of the normalization. Thus "symmetric" as an argument name is a bit misleading. Don't yet have a good alternative naming, maybe something like
or similar. |
looks good to me. |
cool. Evtl. even better |
Yes, |
Should this become a follow up issue that deals with this clarification. There is one final question: the changes are api-breaking (the default |
The normal way would be to start with |
Yes, please open an issue on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added two comments suggesting to remove (default)
from the docstring as it is better to have a single source of truth (the function signature), t indicate default values.
In addition, I made a few suggestions to use semantic line breaks.
Besides those, everything looks good to me now.
the implementation looks good! Tested it against |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the implementation looks good! Tested it against jiwer.wer and got the same results for norm=truth. All fine here!
Only thing I'd still change is to skip or explain (N=S+D+C)
from the word_error_rate
docstring. But I don't insist.
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
btw, this fellow ureichel of whom we are awaiting requested review, could be removed from the list of reviewers. I cannot do it it seems. |
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
I have removed him. This guy is entirely unresponsive anyway. |
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
@ChristianGeng I know you wanted to add error handling, but I would propose to first merge this pull request and address error handling in another pull request. |
D'accord. I didn't receive a notification that this is approved. So I will merge first. |
closes #62
Probably it is debatable whether the test organization is ok: I have split into two separate tests depending on the value of
symmetric
. Probably one could have have a single test that incorporates the value of symmetric into the parametrization. However I find that the long parameterizations are also hard to read. So I think this is justified.Summary by Sourcery
Update the word_error_rate function to support symmetric normalization and add corresponding test cases. Introduce a temporary wrapper for WER calculation using the jiwer library.
New Features:
Enhancements:
Tests: