Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding lancedb to langchain vectorstores #291

Merged
merged 7 commits into from
Aug 28, 2024

Conversation

sharanshirodkar7
Copy link
Contributor

Description

This PR introduces LanceDB as a new vector store for LangChain. LanceDB is an embedded vector database for AI applications, open source and distributed under the Apache-2.0 license. It allows datasets to be persisted to disk and shared in Python, enabling seamless integration and functionality across different environments.

Issues

This PR addresses the requirement to support LanceDB within the LangChain ecosystem, enabling users to utilize this new vector store for their AI applications. No specific issue or RFC link is associated with this PR. Therefore, it is marked as n/a.

Type of change

  • New feature (non-breaking change which adds new functionality)

Dependencies

This PR introduces a new 3rd party dependency: vectordb.

Tests

The following tests were performed to verify the changes:

  • Created a new index from texts using LanceDB and performed a similarity search.
  • Created a new index from a loader using LanceDB and performed a similarity search.
  • Opened an existing LanceDB dataset and performed a similarity search.

All tests confirmed the correct functionality of LanceDB within the LangChain vector stores ecosystem.

sharanshirodkar7 and others added 2 commits July 10, 2024 12:01
Signed-off-by: sharanshirodkar7 <ssharanshirodkar7@gmail.com>
@sharanshirodkar7
Copy link
Contributor Author

@kevinintel this is the new PR.

@chensuyue chensuyue requested a review from letonghan July 18, 2024 14:03
@letonghan
Copy link
Collaborator

Hi @sharanshirodkar7
Please add the __init__.py file in the folder comps/vectorstores/langchain/lancedb.
Can you further explain that how to start a lancedb database? Does it have a docker image to start service?

@lvliang-intel lvliang-intel merged commit 2360e5a into opea-project:main Aug 28, 2024
4 checks passed
a32543254 pushed a commit to a32543254/GenAIComps that referenced this pull request Sep 3, 2024
* adding lancedb to langchain vectorstores

Signed-off-by: sharanshirodkar7 <ssharanshirodkar7@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sharanshirodkar7 <ssharanshirodkar7@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: lvliang-intel <liang1.lv@intel.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>
sharanshirodkar7 added a commit to predictionguard/pg-GenAIComps that referenced this pull request Sep 3, 2024
* adding lancedb to langchain vectorstores

Signed-off-by: sharanshirodkar7 <ssharanshirodkar7@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sharanshirodkar7 <ssharanshirodkar7@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: lvliang-intel <liang1.lv@intel.com>
lvliang-intel added a commit that referenced this pull request Sep 10, 2024
* add rerank with neural speed

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* add the code

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* add the code

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* fix mismatched response format w/wo streaming guardrails (#568)

* fix mismatched response format w/wo streaming  guardrails

* fix & debug

* fix & rm debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* Fix guardrails out handle logics for space linebreak and quote (#571)

* fix mismatched response format w/wo streaming  guardrails

* fix & debug

* fix & rm debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* debug

* debug

* debug

* fix pre-space and linebreak

* fix pre-space and linebreak

* fix single/double quote

* fix single/double quote

* remove debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* BUG FIX: LVM security fix (#572)

* add url validator

Signed-off-by: BaoHuiling <huiling.bao@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add validation for video_url

Signed-off-by: BaoHuiling <huiling.bao@intel.com>

---------

Signed-off-by: BaoHuiling <huiling.bao@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* Modify output messages. (#569)

* Reduced output.

Signed-off-by: zepan <ze.pan@intel.com>

* Output the location where the modified Dockerfile file is referenced.

Signed-off-by: zepan <ze.pan@intel.com>

* for test

Signed-off-by: zepan <ze.pan@intel.com>

* Restore test file.

Signed-off-by: zepan <ze.pan@intel.com>

---------

Signed-off-by: zepan <ze.pan@intel.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* refine logging code. (#559)

* add ut and refine logging code.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update microservice port.

---------

Co-authored-by: root <root@idc708073.jf.intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* adding lancedb to langchain vectorstores (#291)

* adding lancedb to langchain vectorstores

Signed-off-by: sharanshirodkar7 <ssharanshirodkar7@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sharanshirodkar7 <ssharanshirodkar7@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: lvliang-intel <liang1.lv@intel.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* Refine Dataprep Milvus MS (#570)

Signed-off-by: letonghan <letong.han@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* final version

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* update the readme

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* add the sign

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* fix error for pre ci

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* add the ut

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* update docker file

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* update CI test log achieve (#577)

Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* Multimodal dataprep (#575)

* multimodal embedding for MM RAG for videos

Signed-off-by: Tiep Le <tiep.le@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* develop data prep first commit

Signed-off-by: Tiep Le <tiep.le@intel.com>

* develop dataprep microservice for multimodal data

Signed-off-by: Tiep Le <tiep.le@intel.com>

* multimodal langchain for dataprep

Signed-off-by: Tiep Le <tiep.le@intel.com>

* update README

Signed-off-by: Tiep Le <tiep.le@intel.com>

* update README

Signed-off-by: Tiep Le <tiep.le@intel.com>

* update README

Signed-off-by: Tiep Le <tiep.le@intel.com>

* update README

Signed-off-by: Tiep Le <tiep.le@intel.com>

* cosmetic

Signed-off-by: Tiep Le <tiep.le@intel.com>

* test for multimodal dataprep

Signed-off-by: Tiep Le <tiep.le@intel.com>

* update test

Signed-off-by: Tiep Le <tiep.le@intel.com>

* update test

Signed-off-by: Tiep Le <tiep.le@intel.com>

* update test

Signed-off-by: Tiep Le <tiep.le@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cosmetic update

Signed-off-by: Tiep Le <tiep.le@intel.com>

* remove langsmith

Signed-off-by: Tiep Le <tiep.le@intel.com>

* update API to remove /dataprep from API names and remove langsmith

Signed-off-by: Tiep Le <tiep.le@intel.com>

* update test

Signed-off-by: Tiep Le <tiep.le@intel.com>

* update the error message per PR reviewer

Signed-off-by: Tiep Le <tiep.le@intel.com>

---------

Signed-off-by: Tiep Le <tiep.le@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* add: Pathway vector store and retriever as LangChain component (#342)

* nb

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* init changes

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* docker

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* example data

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* docs(readme): update, add commands

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* fix: formatting, data sources

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* docs(readme): update instructions, add comments

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* fix: rm unused parts

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* fix: image name, compose env vars

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* fix: rm unused part

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* fix: logging name

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* fix: env var

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* fix: rename pw docker

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* docs(readme): update input sources

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* nb

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* init changes

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* fix: formatting, data sources

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* docs(readme): update instructions, add comments

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* fix: rm unused part

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* fix: rename pw docker

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* feat: mv vector store, naming, clarify instructions, improve ingestion components

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tests: add pw retriever test
fix: update docker to include libmagic

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* implement suggestions from review, entrypoint, reqs, comments, https_proxy.

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: update docker tags in test and readme

Signed-off-by: Berke <berkecanrizai1@gmail.com>

* tests: add separate pathway vectorstore test

Signed-off-by: Berke <berkecanrizai1@gmail.com>

---------

Signed-off-by: Berke <berkecanrizai1@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sihan Chen <39623753+Spycsh@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* Add local Rerank microservice for VideoRAGQnA (#496)

* initial commit

Signed-off-by: BaoHuiling <huiling.bao@intel.com>

* save

Signed-off-by: BaoHuiling <huiling.bao@intel.com>

* add readme, test script, fix bug

Signed-off-by: BaoHuiling <huiling.bao@intel.com>

* update video URL

Signed-off-by: BaoHuiling <huiling.bao@intel.com>

* use default

Signed-off-by: BaoHuiling <huiling.bao@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update core dependency

Signed-off-by: BaoHuiling <huiling.bao@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use p 5000

Signed-off-by: BaoHuiling <huiling.bao@intel.com>

* use 5037

Signed-off-by: BaoHuiling <huiling.bao@intel.com>

* update ctnr name

Signed-off-by: BaoHuiling <huiling.bao@intel.com>

* remove langsmith

Signed-off-by: BaoHuiling <huiling.bao@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add rerank algo desc in readme

Signed-off-by: BaoHuiling <huiling.bao@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: BaoHuiling <huiling.bao@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* Add Scan Container. (#560)

Signed-off-by: zepan <ze.pan@intel.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* fix SearchedMultimodalDoc in docarray (#583)

Signed-off-by: BaoHuiling <huiling.bao@intel.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* update image build yaml (#529)

Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: zepan <ze.pan@intel.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* add microservice for intent detection (#131)

* add microservice for intent detection

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update license copyright

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* add ut

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* refine

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update folder

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix test

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

---------

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* Make the scanning method optional. (#580)

Signed-off-by: zepan <ze.pan@intel.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* add code owners (#586)

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* remove revision for tei (#584)

Signed-off-by: letonghan <letong.han@intel.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* Bug fix (#591)

* Check if the document exists.

Signed-off-by: zepan <ze.pan@intel.com>

* Add flag output.

Signed-off-by: zepan <ze.pan@intel.com>

* Modify nginx readme.

Signed-off-by: zepan <ze.pan@intel.com>

* Modify document detection logic

Signed-off-by: zepan <ze.pan@intel.com>

---------

Signed-off-by: zepan <ze.pan@intel.com>
Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* fix ut issue

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* merge the main

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* align with new pipeline

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* align with newest pipeline

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* upload code

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* update the ut

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* add docker path

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* add the docker path

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

---------

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>
Signed-off-by: BaoHuiling <huiling.bao@intel.com>
Signed-off-by: zepan <ze.pan@intel.com>
Signed-off-by: sharanshirodkar7 <ssharanshirodkar7@gmail.com>
Signed-off-by: letonghan <letong.han@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Berke <berkecanrizai1@gmail.com>
Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sihan Chen <39623753+Spycsh@users.noreply.github.com>
Co-authored-by: Huiling Bao <huiling.bao@intel.com>
Co-authored-by: ZePan110 <ze.pan@intel.com>
Co-authored-by: lkk <33276950+lkk12014402@users.noreply.github.com>
Co-authored-by: root <root@idc708073.jf.intel.com>
Co-authored-by: Sharan Shirodkar <91109427+sharanshirodkar7@users.noreply.github.com>
Co-authored-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: Letong Han <106566639+letonghan@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: berkecanrizai <63911408+berkecanrizai@users.noreply.github.com>
Co-authored-by: Liangyx2 <yuxiang.liang@intel.com>
Co-authored-by: kevinintel <hanwen.chang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants