-
-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llms: Add support for using the whisper model to transcribe audio #696
base: main
Are you sure you want to change the base?
Conversation
cbd159d
to
ce706e1
Compare
llms/openai/multicontent_test.go
Outdated
_, err := os.Stat(audioFilePath) | ||
require.NoError(t, err) | ||
|
||
rsp, err := llm.TranscribeAudio(context.Background(), audioFilePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to think of "transcribe audio" in the context of LLMs? AFAIU Whisper is a distinct model from LLMs like the GPT family.
Is this intended as a one-off method only for openai, or as some general audio transcription interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only for openai, but I think it's interesting to include it in the general context, because there are other models that do this, but at the moment it's only for openai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tried it yet, but this seems interesting as a locally running alternative: https://github.com/JigsawStack/insanely-fast-whisper-api
It would be cool if we could support something like that, so you can combine it with a Ollama to build some local-only tools.
(So far I've been using https://github.com/Purfview/whisper-standalone-win locally, which is a single-binary wrapper around https://github.com/SYSTRAN/faster-whisper)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can implement it in other LLMS, the problem is that so far I have only found the standalone version for Windows, but I am looking for other alternatives
I agree with @eliben's intuition here, I'm not sure if audio transcription as a concept fits right into our llm namespace. I'm open to exposing this and generalizing over providers but I think it belongs in a different namespace. |
What would be the implementation idea for this functionality? maybe use openai.TranscribeAudio, leaving it only within the openai package and not in the LLM namespace? I think in use how it, do a loader |
@tmc some update ? |
PR Checklist
memory: add interfaces for X, Y
orutil: add whizzbang helpers
).Fixes #123
).golangci-lint
checks.