Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expose no-speech probability in segment #2654

Merged

Conversation

sachaarbonel
Copy link
Contributor

@sachaarbonel sachaarbonel commented Dec 21, 2024

This PR adds support for exposing no-speech probability at the segment level in Whisper transcriptions, which helps identify potential hallucinations and non-speech segments in the output.

Key Changes

  • Added no_speech_prob field to whisper_segment struct
  • Exposed segment-level no-speech probability through new API: whisper_full_get_segment_no_speech_prob
  • Added configurable no_speech_thold parameter (default: 0.6) via CLI arguments
  • Included no_speech_prob in JSON output for each segment

Why This Matters

  • Helps identify potential hallucinations by detecting segments with high no-speech probability
  • Enables filtering of non-speech segments (background noise, silence, etc.)
  • Provides more granular control over transcription quality

@sachaarbonel sachaarbonel changed the title feat: implement no_speech_prob feat: add no_speech_prob to segment Dec 21, 2024
@sachaarbonel sachaarbonel changed the title feat: add no_speech_prob to segment feat: expose no-speech probability in segment Dec 21, 2024
@ggerganov ggerganov merged commit 4183517 into ggerganov:master Dec 21, 2024
42 checks passed
bygreencn added a commit to bygreencn/whisper.cpp that referenced this pull request Dec 26, 2024
* ggerganov/master: (49 commits)
  cli : add --suppress_nst support (ggerganov#2664)
  cli : add no_speech_thold (ggerganov#2663)
  cmake : remove hardcoded install rpath
  server : fix help print
  ruby : bug fix on callbacks and no_speech_prob (ggerganov#2656)
  server : add no-speech threshold parameter and functionality (ggerganov#2654)
  whisper : rename suppress_non_speech_tokens to suppress_nst (ggerganov#2653)
  server : add option to suppress non-speech tokens (ggerganov#2649)
  whisper : rename binaries + fix install (ggerganov#2648)
  ruby : update gem version to v1.3.1
  release : v1.7.3
  ci : msys enable SDL2 build (ggerganov#2635)
  ruby : sync ggml (ggerganov#2643)
  android : try to fix build
  files : remove old sources
  sync : ggml
  talk-llama : sync llama.cpp
  sync : ggml
  ggml : update ggml_backend_cpu_device_supports_op (llama/10867)
  vulkan: bugfixes for small subgroup size systems + llvmpipe test (llama/10809)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants