diff --git a/README.md b/README.md index b1c65c38..e8ad422d 100644 --- a/README.md +++ b/README.md @@ -178,10 +178,12 @@ Here is a non exhaustive list of open-source projects using faster-whisper. Feel * [whisper-ctranslate2](https://github.com/Softcatala/whisper-ctranslate2) is a command line client based on faster-whisper and compatible with the original client from openai/whisper. * [whisper-diarize](https://github.com/MahmoudAshraf97/whisper-diarization) is a speaker diarization tool that is based on faster-whisper and NVIDIA NeMo. -* [whisper-standalone-win](https://github.com/Purfview/whisper-standalone-win) contains the portable ready to run binaries of faster-whisper for Windows. +* [whisper-standalone-win](https://github.com/Purfview/whisper-standalone-win) Standalone CLI executables of faster-whisper for Windows, Linux & macOS. * [asr-sd-pipeline](https://github.com/hedrergudene/asr-sd-pipeline) provides a scalable, modular, end to end multi-speaker speech to text solution implemented using AzureML pipelines. * [Open-Lyrics](https://github.com/zh-plus/Open-Lyrics) is a Python library that transcribes voice files using faster-whisper, and translates/polishes the resulting text into `.lrc` files in the desired language using OpenAI-GPT. * [wscribe](https://github.com/geekodour/wscribe) is a flexible transcript generation tool supporting faster-whisper, it can export word level transcript and the exported transcript then can be edited with [wscribe-editor](https://github.com/geekodour/wscribe-editor) +* [aTrain](https://github.com/BANDAS-Center/aTrain) is a graphical user interface implementation of faster-whisper developed at the BANDAS-Center at the University of Graz for transcription and diarization in Windows ([Windows Store App](https://apps.microsoft.com/detail/atrain/9N15Q44SZNS2)) and Linux. +* [WhisperLive](https://github.com/collabora/WhisperLive) is a nearly-live implementation of OpenAI's Whisper which uses faster-whisper as the backend to transcribe audio in real-time. ## Model conversion diff --git a/faster_whisper/transcribe.py b/faster_whisper/transcribe.py index e0525b9e..7996321e 100644 --- a/faster_whisper/transcribe.py +++ b/faster_whisper/transcribe.py @@ -731,6 +731,13 @@ def generate_with_fallback( decode_result = max( below_cr_threshold_results or all_results, key=lambda x: x[1] ) + # to pass final temperature for prompt_reset_on_temperature + decode_result = ( + decode_result[0], + decode_result[1], + temperature, + decode_result[3], + ) return decode_result @@ -908,6 +915,13 @@ def find_alignment( words, word_tokens = tokenizer.split_to_word_tokens( text_tokens + [tokenizer.eot] ) + if len(word_tokens) <= 1: + # return on eot only + # >>> np.pad([], (1, 0)) + # array([0.]) + # This results in crashes when we lookup jump_times with float, like + # IndexError: arrays used as indices must be of integer (or boolean) type + return [] word_boundaries = np.pad(np.cumsum([len(t) for t in word_tokens[:-1]]), (1, 0)) if len(word_boundaries) <= 1: return []