fix: suppress common whisper hallucinations during silence #3884
+41
−10
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Expands the Whisper hallucination filter in
crates/whisper-local/src/model/actual.rsto suppress more known hallucination phrases that Whisper produces when a channel (mic or speaker) is silent.The previous filter only did exact-match checks against a handful of strings (
"you","thank you","you.","thank you.","♪"). The newis_hallucinationmethod:"Thank you!"or"Thank you,"are now caught."the","thanks","bye","goodbye","bye bye","so","oh","uh","hmm","ah","music", and empty strings.starts_withto catch common YouTube-training-data hallucinations like"thank you for watching","thanks for listening","please subscribe","subtitles by", etc.Hallucination list informed by sachaarbonel/whisper-hallucinations dataset of Whisper outputs on noise-only audio.
Review & Testing Checklist for Human
starts_with("thank you"): This will filter any segment beginning with "thank you", including legitimate speech like "Thank you, John, for joining us". Verify this is acceptable given that segments are typically short chunks, or consider tightening the match."so","oh","the"apply to the entire segment text after stripping punctuation. Confirm that real speech segments are unlikely to consist of only these words (they should be fine given VAD chunking, but worth verifying).is_hallucinationfunction has no test coverage. Consider adding tests for edge cases (e.g.,"Thank you, Sarah"should NOT be filtered,"Thank you for watching"should).Notes
Requested by: @ComputelessComputer
Link to Devin run