Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve Inference Selection Bug Affecting Transcription Quality #1377

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

TheoBoyer
Copy link
Contributor

Currently, when none of the inference made at different temperatures can satisfy all of the "non-fallback" conditions, the last one is returned (the one with the biggest emperature with default args).

This can lead to weird behaviors, for example:

# I want to transcribe this piece of audio
print(model.transcribe(audio)["text"]) # Works fine

# I want to transcribe this piece of audio, and i want it to be very good so I increase the min logprob
print(model.transcribe(audio, logprob_threshold=-0.2)["text"]) # Result is bad because the one inference that is returned is the last one, with the biggest temperature

This PR doesn't change the behaviour when one of the inferences satisfy all of the conditions.
When it's not the case, the result that is returned is the one leading to the highest avg_logprob
When the avg_logprob condition isn't satisfied and the result is re-computed with a greater temperature, the best option is returned

When the avg_logprob condition isn't satisfied and the result is
re-computed with a greater temperature, the best option is returned
@TheoBoyer TheoBoyer changed the title Return best text Resolve Inference Selection Bug Affecting Transcription Quality May 21, 2023
@hoonlight
Copy link

hoonlight commented Jul 17, 2023

I've been testing with this PR, and the improvement is bigger than I thought.
In the majority of the samples I tested, I was able to clearly observe an improvement in transcription accuracy.

@jongwook 님, Could you please review this PR?

@guillaumekln
Copy link
Contributor

This change makes sense but I think it should take into account compression_ratio_threshold.

For example only consider the results where the compression ratio is below compression_ratio_threshold and select the best log prob from them. If all compression ratios are above compression_ratio_threshold, then it can pick the best log probs from all results.

@TheoBoyer
Copy link
Contributor Author

This PR was merged into SYSTRAN/faster-whisper#356 and seems to somewhat improve transcription quality. Is there anything I can do for the review process ?

@pacaklu
Copy link

pacaklu commented Jan 16, 2024

Hello guys,

I have also encountered this bug recently, so let me share my opinion about solution of this issue.

So let's assume that none of the predictions is meeting the criteria for compression_ratio and logprob.

  1. I aggree with @guillaumekln, that in the case where one or more of the predictions is meeting the criteria for compression_ratio, then select the one with the best logprob.
  2. If none of the predictions is meeting the criteria for compression_ratio, it can be a bit tricky to just select the one with the best logprob, because for small improvement of logprob, you can gain huge increase of compression_ratio.

Let me show you Real Data example that I found with whisper small, version 2 (it would be easy to create lot of mock examples, but this I did not want...):

temperature: 0.0
encoded text: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.
logprob: -0.311024
compression: 5.588235294
---------------------------
temperature: 0.2
encoded text: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.
logprob: -0.34232
compression: 5.588235294
---------------------------
temperature: 0.4
encoded text: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.
logprob: -0.343599
compression: 5.588235294
---------------------------
temperature: 0.6
encoded text: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.
logprob: -0.338365
compression: 5.588235294
---------------------------
temperature: 0.8
encoded text: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.
logprob: -0.39118579
compression: 5.588235294
---------------------------
temperature: 1.0
encoded text: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.
logprob: -0.399336
compression: 2.411764705

So in this case, the first result would be selected, although it has very bad compression_ratio, where obviously the correct results would be the last one, where hallucination of the model is at least not that terrible.

So I suggest to calculate something called tradeoff_factor, that would take into both compression_ratio and logprob as:
(logprob of the prediction/logprob threshold) * (compression_ratio of the prediction / compression_ratio threshold).
And afterwards select the prediction with the lowest value of this tradeoff_factor.

In the current case, it will be:
[0.72419, 0.79707, 0.800047, 0.7878599, 0.9106927, 0.40129]

Possible implementation:

def _select_best_prediction(
    decoded_results: List[Decoding_result],
    logprob_threshold: Optional[float] = -1,
    compression_ratio_threshold: Optional[float] = 2.4,
) -> DecodingResult:
    """Select best prediction from decoded results with various temperatures."""
    assert len(decoded_results) > 0
    predictions_meeting_compression = []
    for pred in decoded_results:
        if pred.compression_ratio <= compression_ratio_threshold:
            predictions_meeting_compression.append(pred)

    # Case 1: There exist prediction with compression lower than
    # is the threshold
    # Then select the prediction with best log_prob
    if len(predictions_meeting_compression) > 0:
        return max(predictions_meeting_compression, key=lambda x: x.avg_logprob)

    # Case 2: There does not exist any prediction with compression ratio
    # smaller than the threshold
    # Then calculate tradeoff_factor between log_prob and compression ratio as
    # (logprob of the prediction/logprob threshold) *
    # (compression_ratio of the prediction / compression_ratio threshold)
    # and select the prediction with lowest value of this factor
    else:
        tradeoff_factors = []
        for pred in decoded_results:
            factor = (pred.avg_logprob / logprob_threshold) * (
                pred.compression_ratio / compression_ratio_threshold
            )
            tradeoff_factors.append(factor)
        best_index = tradeoff_factors.index(min(tradeoff_factors))
        return decoded_results(best_index)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants