-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multimodal dataprep #575
Multimodal dataprep #575
Conversation
Signed-off-by: Tiep Le <tiep.le@intel.com>
for more information, see https://pre-commit.ci
Multimodal embedding
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
for more information, see https://pre-commit.ci
Codecov ReportAll modified and coverable lines are covered by tests ✅
|
Signed-off-by: Tiep Le <tiep.le@intel.com>
@lvliang-intel. Here is another PR for multimodal data prep with Redis for Multimodal RAG. This PR includes changes in the PR #555 because #555 has not been reviewed and merged yet to GenAIComps main branch. |
comps/dataprep/redis/multimodal_langchain/prepare_videodoc_redis.py
Outdated
Show resolved
Hide resolved
comps/dataprep/redis/multimodal_langchain/prepare_videodoc_redis.py
Outdated
Show resolved
Hide resolved
comps/dataprep/redis/multimodal_langchain/prepare_videodoc_redis.py
Outdated
Show resolved
Hide resolved
comps/embeddings/multimodal_embeddings/multimodal_langchain/local_mm_embedding.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
should be better to maintain a single PR, considering merge #555 to this one? |
Thank @XuhuiRen for your suggestion. I have merged the PR #555 to this one. I cannot move the suggestions in PR #555 here. However, I have addressed all of the conversations in #555 into this PR. |
@BaoHuiling @tileintel i did not see the retriever for multimodal. there is only a reranker PR. Is there any PR I missed? |
i recommend to add the code for image embedding to maintain a comprehensive service. |
You are correct. We haven't submitted the retriever microservice yet. We are waiting for #555 and #575 merged first before we submit the last PRs for retrieval and lvm. |
hi Xuhui. To clarify this, we are contributing different use case for MMRAG, and I’m working on VideoRAGQnA, which is PR #495, #496, #538, #539 and we are going to contribute another PR for dataprep |
but it seems that PR #538 has name conflict with this PR? |
Yes. #538 was developed from a previous commit of this PR. I would suggest that we review and merge #575 first, and @BaoHuiling and I will resolve the conflict from #538 right after. I have mentioned this in the related Issue-538 |
yes there should be some conflicts, we will align it tomorrow and please hold those PR until we resolve the conflicts. thanks! |
@srinarayan-srikanthan @s-gobriel please take a look on this PR, check the conflicts and we need to update the code |
I'm okay with this. Please do it |
@XuhuiRen Given that, @XuhuiRen and @lvliang-intel already approved this PR, would you please help to merge this? This will help us to resolve conflicts for another related PR quicker. Thanks |
* multimodal embedding for MM RAG for videos Signed-off-by: Tiep Le <tiep.le@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * develop data prep first commit Signed-off-by: Tiep Le <tiep.le@intel.com> * develop dataprep microservice for multimodal data Signed-off-by: Tiep Le <tiep.le@intel.com> * multimodal langchain for dataprep Signed-off-by: Tiep Le <tiep.le@intel.com> * update README Signed-off-by: Tiep Le <tiep.le@intel.com> * update README Signed-off-by: Tiep Le <tiep.le@intel.com> * update README Signed-off-by: Tiep Le <tiep.le@intel.com> * update README Signed-off-by: Tiep Le <tiep.le@intel.com> * cosmetic Signed-off-by: Tiep Le <tiep.le@intel.com> * test for multimodal dataprep Signed-off-by: Tiep Le <tiep.le@intel.com> * update test Signed-off-by: Tiep Le <tiep.le@intel.com> * update test Signed-off-by: Tiep Le <tiep.le@intel.com> * update test Signed-off-by: Tiep Le <tiep.le@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cosmetic update Signed-off-by: Tiep Le <tiep.le@intel.com> * remove langsmith Signed-off-by: Tiep Le <tiep.le@intel.com> * update API to remove /dataprep from API names and remove langsmith Signed-off-by: Tiep Le <tiep.le@intel.com> * update test Signed-off-by: Tiep Le <tiep.le@intel.com> * update the error message per PR reviewer Signed-off-by: Tiep Le <tiep.le@intel.com> --------- Signed-off-by: Tiep Le <tiep.le@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>
* multimodal embedding for MM RAG for videos Signed-off-by: Tiep Le <tiep.le@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * develop data prep first commit Signed-off-by: Tiep Le <tiep.le@intel.com> * develop dataprep microservice for multimodal data Signed-off-by: Tiep Le <tiep.le@intel.com> * multimodal langchain for dataprep Signed-off-by: Tiep Le <tiep.le@intel.com> * update README Signed-off-by: Tiep Le <tiep.le@intel.com> * update README Signed-off-by: Tiep Le <tiep.le@intel.com> * update README Signed-off-by: Tiep Le <tiep.le@intel.com> * update README Signed-off-by: Tiep Le <tiep.le@intel.com> * cosmetic Signed-off-by: Tiep Le <tiep.le@intel.com> * test for multimodal dataprep Signed-off-by: Tiep Le <tiep.le@intel.com> * update test Signed-off-by: Tiep Le <tiep.le@intel.com> * update test Signed-off-by: Tiep Le <tiep.le@intel.com> * update test Signed-off-by: Tiep Le <tiep.le@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cosmetic update Signed-off-by: Tiep Le <tiep.le@intel.com> * remove langsmith Signed-off-by: Tiep Le <tiep.le@intel.com> * update API to remove /dataprep from API names and remove langsmith Signed-off-by: Tiep Le <tiep.le@intel.com> * update test Signed-off-by: Tiep Le <tiep.le@intel.com> * update the error message per PR reviewer Signed-off-by: Tiep Le <tiep.le@intel.com> --------- Signed-off-by: Tiep Le <tiep.le@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* add rerank with neural speed Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * add the code Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * add the code Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * fix mismatched response format w/wo streaming guardrails (#568) * fix mismatched response format w/wo streaming guardrails * fix & debug * fix & rm debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * Fix guardrails out handle logics for space linebreak and quote (#571) * fix mismatched response format w/wo streaming guardrails * fix & debug * fix & rm debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * debug * debug * debug * fix pre-space and linebreak * fix pre-space and linebreak * fix single/double quote * fix single/double quote * remove debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * BUG FIX: LVM security fix (#572) * add url validator Signed-off-by: BaoHuiling <huiling.bao@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add validation for video_url Signed-off-by: BaoHuiling <huiling.bao@intel.com> --------- Signed-off-by: BaoHuiling <huiling.bao@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * Modify output messages. (#569) * Reduced output. Signed-off-by: zepan <ze.pan@intel.com> * Output the location where the modified Dockerfile file is referenced. Signed-off-by: zepan <ze.pan@intel.com> * for test Signed-off-by: zepan <ze.pan@intel.com> * Restore test file. Signed-off-by: zepan <ze.pan@intel.com> --------- Signed-off-by: zepan <ze.pan@intel.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * refine logging code. (#559) * add ut and refine logging code. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update microservice port. --------- Co-authored-by: root <root@idc708073.jf.intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * adding lancedb to langchain vectorstores (#291) * adding lancedb to langchain vectorstores Signed-off-by: sharanshirodkar7 <ssharanshirodkar7@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: sharanshirodkar7 <ssharanshirodkar7@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: lvliang-intel <liang1.lv@intel.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * Refine Dataprep Milvus MS (#570) Signed-off-by: letonghan <letong.han@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * final version Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * update the readme Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * add the sign Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * fix error for pre ci Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * add the ut Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * update docker file Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * update CI test log achieve (#577) Signed-off-by: chensuyue <suyue.chen@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * Multimodal dataprep (#575) * multimodal embedding for MM RAG for videos Signed-off-by: Tiep Le <tiep.le@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * develop data prep first commit Signed-off-by: Tiep Le <tiep.le@intel.com> * develop dataprep microservice for multimodal data Signed-off-by: Tiep Le <tiep.le@intel.com> * multimodal langchain for dataprep Signed-off-by: Tiep Le <tiep.le@intel.com> * update README Signed-off-by: Tiep Le <tiep.le@intel.com> * update README Signed-off-by: Tiep Le <tiep.le@intel.com> * update README Signed-off-by: Tiep Le <tiep.le@intel.com> * update README Signed-off-by: Tiep Le <tiep.le@intel.com> * cosmetic Signed-off-by: Tiep Le <tiep.le@intel.com> * test for multimodal dataprep Signed-off-by: Tiep Le <tiep.le@intel.com> * update test Signed-off-by: Tiep Le <tiep.le@intel.com> * update test Signed-off-by: Tiep Le <tiep.le@intel.com> * update test Signed-off-by: Tiep Le <tiep.le@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cosmetic update Signed-off-by: Tiep Le <tiep.le@intel.com> * remove langsmith Signed-off-by: Tiep Le <tiep.le@intel.com> * update API to remove /dataprep from API names and remove langsmith Signed-off-by: Tiep Le <tiep.le@intel.com> * update test Signed-off-by: Tiep Le <tiep.le@intel.com> * update the error message per PR reviewer Signed-off-by: Tiep Le <tiep.le@intel.com> --------- Signed-off-by: Tiep Le <tiep.le@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * add: Pathway vector store and retriever as LangChain component (#342) * nb Signed-off-by: Berke <berkecanrizai1@gmail.com> * init changes Signed-off-by: Berke <berkecanrizai1@gmail.com> * docker Signed-off-by: Berke <berkecanrizai1@gmail.com> * example data Signed-off-by: Berke <berkecanrizai1@gmail.com> * docs(readme): update, add commands Signed-off-by: Berke <berkecanrizai1@gmail.com> * fix: formatting, data sources Signed-off-by: Berke <berkecanrizai1@gmail.com> * docs(readme): update instructions, add comments Signed-off-by: Berke <berkecanrizai1@gmail.com> * fix: rm unused parts Signed-off-by: Berke <berkecanrizai1@gmail.com> * fix: image name, compose env vars Signed-off-by: Berke <berkecanrizai1@gmail.com> * fix: rm unused part Signed-off-by: Berke <berkecanrizai1@gmail.com> * fix: logging name Signed-off-by: Berke <berkecanrizai1@gmail.com> * fix: env var Signed-off-by: Berke <berkecanrizai1@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Berke <berkecanrizai1@gmail.com> * fix: rename pw docker Signed-off-by: Berke <berkecanrizai1@gmail.com> * docs(readme): update input sources Signed-off-by: Berke <berkecanrizai1@gmail.com> * nb Signed-off-by: Berke <berkecanrizai1@gmail.com> * init changes Signed-off-by: Berke <berkecanrizai1@gmail.com> * fix: formatting, data sources Signed-off-by: Berke <berkecanrizai1@gmail.com> * docs(readme): update instructions, add comments Signed-off-by: Berke <berkecanrizai1@gmail.com> * fix: rm unused part Signed-off-by: Berke <berkecanrizai1@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Berke <berkecanrizai1@gmail.com> * fix: rename pw docker Signed-off-by: Berke <berkecanrizai1@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Berke <berkecanrizai1@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: mv vector store, naming, clarify instructions, improve ingestion components Signed-off-by: Berke <berkecanrizai1@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tests: add pw retriever test fix: update docker to include libmagic Signed-off-by: Berke <berkecanrizai1@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * implement suggestions from review, entrypoint, reqs, comments, https_proxy. Signed-off-by: Berke <berkecanrizai1@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: update docker tags in test and readme Signed-off-by: Berke <berkecanrizai1@gmail.com> * tests: add separate pathway vectorstore test Signed-off-by: Berke <berkecanrizai1@gmail.com> --------- Signed-off-by: Berke <berkecanrizai1@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sihan Chen <39623753+Spycsh@users.noreply.github.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * Add local Rerank microservice for VideoRAGQnA (#496) * initial commit Signed-off-by: BaoHuiling <huiling.bao@intel.com> * save Signed-off-by: BaoHuiling <huiling.bao@intel.com> * add readme, test script, fix bug Signed-off-by: BaoHuiling <huiling.bao@intel.com> * update video URL Signed-off-by: BaoHuiling <huiling.bao@intel.com> * use default Signed-off-by: BaoHuiling <huiling.bao@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update core dependency Signed-off-by: BaoHuiling <huiling.bao@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use p 5000 Signed-off-by: BaoHuiling <huiling.bao@intel.com> * use 5037 Signed-off-by: BaoHuiling <huiling.bao@intel.com> * update ctnr name Signed-off-by: BaoHuiling <huiling.bao@intel.com> * remove langsmith Signed-off-by: BaoHuiling <huiling.bao@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add rerank algo desc in readme Signed-off-by: BaoHuiling <huiling.bao@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: BaoHuiling <huiling.bao@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * Add Scan Container. (#560) Signed-off-by: zepan <ze.pan@intel.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * fix SearchedMultimodalDoc in docarray (#583) Signed-off-by: BaoHuiling <huiling.bao@intel.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * update image build yaml (#529) Signed-off-by: chensuyue <suyue.chen@intel.com> Signed-off-by: zepan <ze.pan@intel.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * add microservice for intent detection (#131) * add microservice for intent detection Signed-off-by: Liangyx2 <yuxiang.liang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update license copyright Signed-off-by: Liangyx2 <yuxiang.liang@intel.com> * add ut Signed-off-by: Liangyx2 <yuxiang.liang@intel.com> * refine Signed-off-by: Liangyx2 <yuxiang.liang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update folder Signed-off-by: Liangyx2 <yuxiang.liang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix test Signed-off-by: Liangyx2 <yuxiang.liang@intel.com> --------- Signed-off-by: Liangyx2 <yuxiang.liang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * Make the scanning method optional. (#580) Signed-off-by: zepan <ze.pan@intel.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * add code owners (#586) Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * remove revision for tei (#584) Signed-off-by: letonghan <letong.han@intel.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * Bug fix (#591) * Check if the document exists. Signed-off-by: zepan <ze.pan@intel.com> * Add flag output. Signed-off-by: zepan <ze.pan@intel.com> * Modify nginx readme. Signed-off-by: zepan <ze.pan@intel.com> * Modify document detection logic Signed-off-by: zepan <ze.pan@intel.com> --------- Signed-off-by: zepan <ze.pan@intel.com> Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * fix ut issue Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * merge the main Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * align with new pipeline Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * align with newest pipeline Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * upload code Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * update the ut Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * add docker path Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> * add the docker path Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> --------- Signed-off-by: Dong, Bo1 <bo1.dong@intel.com> Signed-off-by: BaoHuiling <huiling.bao@intel.com> Signed-off-by: zepan <ze.pan@intel.com> Signed-off-by: sharanshirodkar7 <ssharanshirodkar7@gmail.com> Signed-off-by: letonghan <letong.han@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Signed-off-by: Tiep Le <tiep.le@intel.com> Signed-off-by: Berke <berkecanrizai1@gmail.com> Signed-off-by: Liangyx2 <yuxiang.liang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sihan Chen <39623753+Spycsh@users.noreply.github.com> Co-authored-by: Huiling Bao <huiling.bao@intel.com> Co-authored-by: ZePan110 <ze.pan@intel.com> Co-authored-by: lkk <33276950+lkk12014402@users.noreply.github.com> Co-authored-by: root <root@idc708073.jf.intel.com> Co-authored-by: Sharan Shirodkar <91109427+sharanshirodkar7@users.noreply.github.com> Co-authored-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: Letong Han <106566639+letonghan@users.noreply.github.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com> Co-authored-by: berkecanrizai <63911408+berkecanrizai@users.noreply.github.com> Co-authored-by: Liangyx2 <yuxiang.liang@intel.com> Co-authored-by: kevinintel <hanwen.chang@intel.com>
Description
This PR introduces multimodal dataprep microservice. This microservice is required for Multimodal RAG on Videos application. This allows users to upload mp4 videos and their associated transcripts (optional) and ingests them into Redis vector store.
This microservice provides 3 different API allowing users to upload and ingest videos for 3 use cases:
This microservice also provides an API for user to list all videos ingested under current index name and an API to delete all videos from local storage and from redis vector store under current index name.
Issues
RFC: https://github.com/opea-project/docs/pull/49/files
Issue: opea-project/GenAIExamples#358
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
docarray[full]
fastapi
langchain==0.1.12
langchain_benchmarks
langsmith
moviepy
opencv-python
openai-whisper
opentelemetry-api
opentelemetry-exporter-otlp
opentelemetry-sdk
Pillow
prometheus-fastapi-instrumentator
pydantic==2.8.2
python-multipart
redis
transformers
shortuuid
uvicorn
webvtt-py
llava-hf/llava-1.5-7b-hf
Tests
We have provided 1 test for this microservice.