Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update wer to standard definition #63

Merged
merged 32 commits into from
Dec 19, 2024

Conversation

ChristianGeng
Copy link
Member

@ChristianGeng ChristianGeng commented Dec 4, 2024

image

closes #62

Probably it is debatable whether the test organization is ok: I have split into two separate tests depending on the value of symmetric. Probably one could have have a single test that incorporates the value of symmetric into the parametrization. However I find that the long parameterizations are also hard to read. So I think this is justified.

Summary by Sourcery

Update the word_error_rate function to support symmetric normalization and add corresponding test cases. Introduce a temporary wrapper for WER calculation using the jiwer library.

New Features:

  • Introduce a new parameter 'symmetric' to the word_error_rate function, allowing for symmetric normalization by the maximum length of truth and prediction.

Enhancements:

  • Add a temporary wrapper function _wer_jiwer to implement Word Error Rate (WER) using the jiwer library.

Tests:

  • Add new test cases for the word_error_rate function to validate both symmetric and asymmetric normalization scenarios.

Copy link

sourcery-ai bot commented Dec 4, 2024

Reviewer's Guide by Sourcery

This PR updates the Word Error Rate (WER) implementation to align with the standard definition by introducing an asymmetric normalization option. The changes modify the core WER calculation logic and add comprehensive test cases for both symmetric and asymmetric calculations.

No diagrams generated as the changes look simple and do not need a visual representation.

File-Level Changes

Change Details Files
Updated Word Error Rate (WER) calculation to support both symmetric and asymmetric normalization
  • Added symmetric parameter with default value False for backward compatibility
  • Modified normalization factor to use reference length for asymmetric mode
  • Updated documentation with detailed mathematical explanations
  • Added example showing asymmetric normalization behavior
audmetric/core/api.py
Enhanced test coverage for Word Error Rate calculations
  • Split existing test into symmetric-specific test case
  • Added new test cases for asymmetric WER calculation
  • Added temporary jiwer WER implementation wrapper for reference
  • Included edge cases with empty sequences and None values
tests/test_api.py

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

codecov bot commented Dec 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.0%. Comparing base (49f0e4c) to head (5016444).
Report is 3 commits behind head on main.

Additional details and impacted files
Files with missing lines Coverage Δ
audmetric/core/api.py 100.0% <100.0%> (ø)

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ChristianGeng - I've reviewed your changes - here's some feedback:

Overall Comments:

  • The _wer_jiwer() function appears to be unused. Consider either removing it or adding tests that utilize it.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ChristianGeng
Copy link
Member Author

Hey @ChristianGeng - I've reviewed your changes - here's some feedback:

Overall Comments:

  • The _wer_jiwer() function appears to be unused. Consider either removing it or adding tests that utilize it.

Here's what I looked at during the review
Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

I has been removed.

@ChristianGeng ChristianGeng requested a review from hagenw December 4, 2024 12:38
@ChristianGeng ChristianGeng self-assigned this Dec 4, 2024
Copy link
Member

@hagenw hagenw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the documentation part of the pull request. The actual implementation is better review by @ureichel. I will assign him as well.

The current update sets symmetric=False as default, which will change the current behavior of audmetric.word_error_rate(). But as this is in line with the default definition in Wikipedia, it seems to make sense to me.

audmetric/core/api.py Outdated Show resolved Hide resolved
audmetric/core/api.py Outdated Show resolved Hide resolved
audmetric/core/api.py Outdated Show resolved Hide resolved
@hagenw hagenw requested a review from ureichel December 4, 2024 13:20
ChristianGeng and others added 3 commits December 4, 2024 14:41
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
@reichelu
Copy link

reichelu commented Dec 4, 2024

hi @hagenw @ChristianGeng

thanks a lot for these updates! I'd have 2 comments:

meaning of "mean edit distance" in docstring

In the docstring it says:

    The mean edit distance of each (truth, prediction)-pair is computed
    as an average of the edit distance over a normalization factor n.

To my understanding I would not speak of "mean edit distance" for single sequence pairs, but rather of "normalized edit distance". This normalized edit distance can be seen as the mean editing cost per sequence item. Thus the mean is calculated for edit costs rather than for the edit distance, which is a single value only.

"symmetric" argument

Further it says:

    The value of n depends on the symmetric parameter.
    If ``symmetric`` is ``False`` (default):
    n is set to the reference (truth) length, following the Wikipedia formulation
    where n is the number of words in the reference (N=S+D+C). This means WER can
    be greater than 1 if the prediction sequence is longer than the reference:

Here I have to apologize: if I remember correctly the choice of this name is based on my suggestion, which on second thought was not well chosen. Actually, all non zero-substitution edit costs been equal, the WER is always symmetric, regardless of the normalization. Thus "symmetric" as an argument name is a bit misleading.

Don't yet have a good alternative naming, maybe something like

  • maxlennorm=False
  • norm="reference" | "longest"

or similar.

@hagenw
Copy link
Member

hagenw commented Dec 4, 2024

norm="reference" | "longest"

looks good to me.

@reichelu
Copy link

reichelu commented Dec 4, 2024

norm="reference" | "longest"

looks good to me.

cool. Evtl. even better norm = "truth" | "longest" (default "truth") since you are using "truth" as an argument name.

@hagenw
Copy link
Member

hagenw commented Dec 4, 2024

Yes, truth seems also better to me, as we use the name already.

@ChristianGeng
Copy link
Member Author

Good question, I thought word_error_rate() would use event_error_rate() under the hood, but this is not the case.

Looking at the previous implementation, the only difference seems to be that word_error_rate() checks which words are unique before, whereas event_error_rate() does not. But I couldn't find an example yet, where they differ.

Should this become a follow up issue that deals with this clarification.

There is one final question: the changes are api-breaking (the default truth now follows the wikipedia = jiwer conventions). How is that taken care of?

@hagenw
Copy link
Member

hagenw commented Dec 5, 2024

There is one final question: the changes are api-breaking (the default truth now follows the wikipedia = jiwer conventions). How is that taken care of?

The normal way would be to start with norm="longest" and add a warning that the default will change in a future version of audmetric. But as we expected word_error_rate() would follow the standard definition, I would say the current behavior is a bug, and I would vote to directly swtich to norm="truth" as you did here. In the changelog I would then write that this is fixed, but users could get the old behavior with using norm="longest".

@hagenw
Copy link
Member

hagenw commented Dec 5, 2024

Good question, I thought word_error_rate() would use event_error_rate() under the hood, but this is not the case.

Looking at the previous implementation, the only difference seems to be that word_error_rate() checks which words are unique before, whereas event_error_rate() does not. But I couldn't find an example yet, where they differ.

Should this become a follow up issue that deals with this clarification.

Yes, please open an issue on this.

Copy link
Member

@hagenw hagenw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added two comments suggesting to remove (default) from the docstring as it is better to have a single source of truth (the function signature), t indicate default values.

In addition, I made a few suggestions to use semantic line breaks.

Besides those, everything looks good to me now.

audmetric/core/api.py Outdated Show resolved Hide resolved
audmetric/core/api.py Outdated Show resolved Hide resolved
audmetric/core/api.py Outdated Show resolved Hide resolved
audmetric/core/api.py Outdated Show resolved Hide resolved
tests/test_api.py Outdated Show resolved Hide resolved
audmetric/core/api.py Outdated Show resolved Hide resolved
audmetric/core/api.py Outdated Show resolved Hide resolved
@reichelu
Copy link

reichelu commented Dec 9, 2024

I reviewed the documentation part of the pull request. The actual implementation is better review by @ureichel. I will assign him as well.

the implementation looks good! Tested it against jiwer.wer and got the same results for norm=truth. All fine!

Copy link

@reichelu reichelu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the implementation looks good! Tested it against jiwer.wer and got the same results for norm=truth. All fine here!
Only thing I'd still change is to skip or explain (N=S+D+C) from the word_error_rate docstring. But I don't insist.

audmetric/core/api.py Outdated Show resolved Hide resolved
ChristianGeng and others added 4 commits December 9, 2024 13:57
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
@reichelu
Copy link

btw, this fellow ureichel of whom we are awaiting requested review, could be removed from the list of reviewers. I cannot do it it seems.

ChristianGeng and others added 3 commits December 11, 2024 09:51
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
@ChristianGeng ChristianGeng removed the request for review from ureichel December 11, 2024 09:09
@ChristianGeng
Copy link
Member Author

btw, this fellow ureichel of whom we are awaiting requested review, could be removed from the list of reviewers. I cannot do it it seems.

I have removed him. This guy is entirely unresponsive anyway.

audmetric/core/api.py Outdated Show resolved Hide resolved
audmetric/core/api.py Outdated Show resolved Hide resolved
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
audmetric/core/api.py Outdated Show resolved Hide resolved
ChristianGeng and others added 2 commits December 11, 2024 10:34
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
@hagenw
Copy link
Member

hagenw commented Dec 17, 2024

@ChristianGeng I know you wanted to add error handling, but I would propose to first merge this pull request and address error handling in another pull request.

@ChristianGeng
Copy link
Member Author

@ChristianGeng I know you wanted to add error handling, but I would propose to first merge this pull request and address error handling in another pull request.

D'accord. I didn't receive a notification that this is approved. So I will merge first.

@ChristianGeng ChristianGeng merged commit 2b09acb into main Dec 19, 2024
10 checks passed
@ChristianGeng ChristianGeng deleted the update-wer-to-standard-definition branch December 19, 2024 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update error_word_rate() to standard definition
3 participants