Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no_speech_prob always returns 0.0. #1128

Open
giaoyyds opened this issue Nov 12, 2024 · 1 comment
Open

no_speech_prob always returns 0.0. #1128

giaoyyds opened this issue Nov 12, 2024 · 1 comment

Comments

@giaoyyds
Copy link

giaoyyds commented Nov 12, 2024

I'm using large-v3, and when I convert the audio to a numpy array and pass it to the model for transcription, the no_speech_prob returned is 0.0 every time, but with large-v2 there is a correct return.I can't fix this.Here's my sample code:

    def transcribe_audio(self, audio_numpy):
        try:
            model = WhisperModel("large-v3", device="cuda", compute_type="float16", local_files_only=False)

            result, info = model.transcribe(
                audio_numpy,
                initial_prompt="",
                language="en",
                task="transcribe",
                vad_filter=self.vad,
                vad_parameters={"threshold": 0.5}
            )

            all_segments = list(result)
            print(all_segments)

        except Exception as e:
            print(f"An error occurred during transcription: {e}")


    def send_audio_file(self, audio_file):
    
        print("do me.....")
        with open(audio_file, 'rb') as f:
            audio_data = f.read()
            audio_data = self. removewavhead(audio_data)

            for i in range(0, len(audio_data), 32000):
                chunk = audio_data[i:i + 32000]
                sf = soundfile.SoundFile(io.BytesIO(chunk), channels=2, endian="LITTLE", samplerate=8000, subtype="PCM_16", format="RAW")
                resampled_audio, _ = librosa.load(sf, sr=16000, dtype=np.float32)
                self.transcribe_audio(resampled_audio)
                time.sleep(0.1)
@MahmoudAshraf97
Copy link
Collaborator

I couldn't reproduce the issue on the sample audios included with this repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants