-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AudioQuestionAnswering pipeline #33782
Comments
I think this is quite an interesting idea, and I'd support it as a pipeline (even though we don't have a matching Hub spec for it yet). cc @sanchit-gandhi who I think worked on diarization as well. Overall though, I'd be happy to accept and review the PR, unless anyone else has objections! |
Hey @Rocketknight1, thanks for the willingness to help! I've implemented a working version and iterated on it a bunch, but am at a point I think it would be best to get the opinions of maintainers. A few things undecided I would love some input on:
|
Hmm, I see! I didn't realize when you first proposed this that it combined two separate models that weren't trained together. That is unusual for pipelines - is there a reason to use a single pipeline for this task, instead of just calling a STT pipeline and then passing output to an |
Feature request
A new AudioQuestionAnswering pipeline, just like DQA but instead of providing a document, applying OCR, and doing QA over it, provide audio file, apply STT, and do QA over the transcript. Advanced version includes diarization+STT as speaker annotations provide important context and will improve QA/understanding.
Motivation
This kind of pipeline is one that I have had to build on multiple occasions for processing audio, specifically phone call recordings. Just like the other pipelines which provide accessibility to some applied ML based pipeline for those to use quickly and easily, this will provide the same thing just for a different modality than what is currently provided.
Your contribution
I plan to contribute the entire pipeline. My inspiration and what I plan to base a lot of the PR for this pipeline comes from #18414.
I'm mostly just posting this issue to get feedback from HF team. Tagging @Narsil @NielsRogge as they also provided feedback on the DQA PR.
The text was updated successfully, but these errors were encountered: