Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llms: Add support for using the whisper model to transcribe audio #696

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

devalexandre
Copy link
Contributor

PR Checklist

  • Read the Contributing documentation.
  • Read the Code of conduct documentation.
  • Name your Pull Request title clearly, concisely, and prefixed with the name of the primarily affected package you changed according to Good commit messages (such as memory: add interfaces for X, Y or util: add whizzbang helpers).
  • Check that there isn't already a PR that solves the problem the same way to avoid creating a duplicate.
  • Provide a description in this PR that addresses what the PR is solving, or reference the issue that it solves (e.g. Fixes #123).
  • Describes the source of new concepts.
  • References existing implementations as appropriate.
  • Contains test coverage for new functions.
  • Passes all golangci-lint checks.

llms/openai/openaillm.go Outdated Show resolved Hide resolved
@devalexandre devalexandre requested a review from tmc March 20, 2024 20:49
llms/openai/openaillm.go Outdated Show resolved Hide resolved
@devalexandre devalexandre requested a review from tmc March 20, 2024 22:09
_, err := os.Stat(audioFilePath)
require.NoError(t, err)

rsp, err := llm.TranscribeAudio(context.Background(), audioFilePath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to think of "transcribe audio" in the context of LLMs? AFAIU Whisper is a distinct model from LLMs like the GPT family.

Is this intended as a one-off method only for openai, or as some general audio transcription interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only for openai, but I think it's interesting to include it in the general context, because there are other models that do this, but at the moment it's only for openai

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried it yet, but this seems interesting as a locally running alternative: https://github.com/JigsawStack/insanely-fast-whisper-api

It would be cool if we could support something like that, so you can combine it with a Ollama to build some local-only tools.

(So far I've been using https://github.com/Purfview/whisper-standalone-win locally, which is a single-binary wrapper around https://github.com/SYSTRAN/faster-whisper)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can implement it in other LLMS, the problem is that so far I have only found the standalone version for Windows, but I am looking for other alternatives

@tmc
Copy link
Owner

tmc commented Mar 26, 2024

I agree with @eliben's intuition here, I'm not sure if audio transcription as a concept fits right into our llm namespace. I'm open to exposing this and generalizing over providers but I think it belongs in a different namespace.

@devalexandre
Copy link
Contributor Author

devalexandre commented Mar 27, 2024

@tmc , @eliben

What would be the implementation idea for this functionality? maybe use openai.TranscribeAudio, leaving it only within the openai package and not in the LLM namespace?

I think in use how it, do a loader
https://js.langchain.com/docs/integrations/document_loaders/file_loaders/openai_whisper_audio

@devalexandre devalexandre reopened this Mar 27, 2024
@devalexandre devalexandre requested a review from eliben March 27, 2024 13:41
@devalexandre devalexandre requested a review from corani March 30, 2024 02:30
documentloaders/whisper.go Show resolved Hide resolved
documentloaders/whisper.go Outdated Show resolved Hide resolved
@devalexandre devalexandre requested a review from corani April 1, 2024 11:23
@devalexandre
Copy link
Contributor Author

@tmc some update ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants