-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add hotwords feature #2070
base: main
Are you sure you want to change the base?
add hotwords feature #2070
Conversation
@jongwook hello, please check out this pr. |
Would this be a duplicated effort since there is a parameter that serves the same purpose, |
@James-Shared-Studios This isn't used to add context, it's used to add hot words when some new word or term comes up that makes whisper recognize it. for example:comfyUI is a new word, it is The most powerful and modular stable diffusion GUI and backend.If don't add hotwords, he won't be recognized correctly. |
Have tried it with a video where the following words were misspelled
And indeed it worked to make it so that these words were no longer misspelled with the following args:
But it didn't work 100%, sometimes they were misspelled. Notably |
So, by inputting a series of proper nouns through the hotwords method, what is the maximum length that can actually be supported? @jax-explorer |
@JiweiZh It depends on the n_text_ctx value in the model's dims. |
@jax-explorer Hello, I find this commit very useful and hope this going to be merged soon. Currently, I'm using your forked repository to enjoy this feature. BTW, I have some questions about your implementation.
if (hotwords := self.options.hotwords) is not None:
hotwords_tokens = self.tokenizer.encode(" " + hotwords.strip())
hotwords_tokens = hotwords_tokens[: self.n_ctx] # Use more hotwords
tokens = (
[self.tokenizer.sot_prev]
+ hotwords_tokens
# + (prompt_tokens[-(self.n_ctx // 2 - 1) :] if self.options.prompt is not None else [])
+ tokens
) Thanks! |
hello!
During the transcription process, I often encounter some proprietary or new vocabulary, and Whisper cannot handle it well. I searched for solutions, and the community provided two options:
Fine-tuning the model: This approach is costly, and it's not practical to fine-tune the model every time a new term emerges.
Using initial_prompt: However, initial_prompt only applies to the first window. If specialized terms don't appear at the beginning, this method is ineffective.
Upon reviewing other transcription models, it's common practice to use hotwords. So, I implemented this feature. My approach is to add hotword-related prompts before each transcription window. Since there's a maximum length limit, I occupy the space previously used by the prefix. When the prefix isn't set, hotwords take effect. After testing, it indeed resolved the issue of specialized vocabulary in my scenario.
The following is the community discussion on this issue:
#1477
https://discuss.huggingface.co/t/adding-custom-vocabularies-on-whisper/29311
https://stackoverflow.com/questions/73833916/how-can-i-give-some-hint-phrases-to-openais-whisper-asr