-
Notifications
You must be signed in to change notification settings - Fork 15.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from myscale/feature/vectorstore_myscale
Feature/vectorstore myscale
- Loading branch information
Showing
8 changed files
with
884 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# MyScale | ||
|
||
This page covers how to use MyScale vector database within LangChain. | ||
It is broken into two parts: installation and setup, and then references to specific MyScale wrappers. | ||
|
||
With MyScale, you can manage both structured and unstructured (vectorized) data, and perform joint queries and analytics on both types of data using SQL. Plus, MyScale's cloud-native OLAP architecture, built on top of ClickHouse, enables lightning-fast data processing even on massive datasets. | ||
|
||
## Introduction | ||
|
||
[Overview to MyScale and High performance vector search](https://docs.myscale.com/en/overview/) | ||
|
||
You can now register on our SaaS and [start a cluster now!](https://docs.myscale.com/en/quickstart/) | ||
|
||
If you are also interested in how we managed to integrate SQL and vector, please refer to [this document](https://docs.myscale.com/en/vector-reference/) for further syntax reference. | ||
|
||
We also deliver with live demo on huggingface! Please checkout our [huggingface space](https://huggingface.co/myscale)! They search millions of vector within a blink! | ||
|
||
## Installation and Setup | ||
- Install the Python SDK with `pip install clickhouse-connect` | ||
|
||
### Setting up envrionments | ||
|
||
There are two ways to set up parameters for myscale index. | ||
|
||
1. Environment Variables | ||
|
||
Before you run the app, please set the environment variable with `export`: | ||
`export MYSCALE_URL='<your-endpoints-url>' MYSCALE_PORT=<your-endpoints-port> MYSCALE_USERNAME=<your-username> MYSCALE_PASSWORD=<your-password> ...` | ||
|
||
You can easily find your account, password and other info on our SaaS. For details please refer to [this document](https://docs.myscale.com/en/cluster-management/) | ||
Every attributes under `MyScaleSettings` can be set with prefix `MYSCALE_` and is case insensitive. | ||
|
||
2. Create `MyScaleSettings` object with parameters | ||
|
||
|
||
```python | ||
from langchain.vectorstores import MyScale, MyScaleSettings | ||
config = MyScaleSetting(host="<your-backend-url>", port=8443, ...) | ||
index = MyScale(embedding_function, config) | ||
index.add_documents(...) | ||
``` | ||
|
||
## Wrappers | ||
supported functions: | ||
- `add_texts` | ||
- `add_documents` | ||
- `from_texts` | ||
- `from_documents` | ||
- `similarity_search` | ||
- `asimilarity_search` | ||
- `similarity_search_by_vector` | ||
- `asimilarity_search_by_vector` | ||
- `similarity_search_with_relevance_scores` | ||
|
||
### VectorStore | ||
|
||
There exists a wrapper around MyScale database, allowing you to use it as a vectorstore, | ||
whether for semantic search or similar example retrieval. | ||
|
||
To import this vectorstore: | ||
```python | ||
from langchain.vectorstores import MyScale | ||
``` | ||
|
||
For a more detailed walkthrough of the MyScale wrapper, see [this notebook](../modules/indexes/vectorstores/examples/myscale.ipynb) |
267 changes: 267 additions & 0 deletions
267
docs/modules/indexes/vectorstores/examples/myscale.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,267 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "683953b3", | ||
"metadata": {}, | ||
"source": [ | ||
"# MyScale\n", | ||
"\n", | ||
"This notebook shows how to use functionality related to the MyScale vector database." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "aac9563e", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.embeddings.openai import OpenAIEmbeddings\n", | ||
"from langchain.text_splitter import CharacterTextSplitter\n", | ||
"from langchain.vectorstores import MyScale\n", | ||
"from langchain.document_loaders import TextLoader" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "a9d16fa3", | ||
"metadata": {}, | ||
"source": [ | ||
"## Setting up envrionments\n", | ||
"\n", | ||
"There are two ways to set up parameters for myscale index.\n", | ||
"\n", | ||
"1. Environment Variables\n", | ||
"\n", | ||
" Before you run the app, please set the environment variable with `export`:\n", | ||
" `export MYSCALE_URL='<your-endpoints-url>' MYSCALE_PORT=<your-endpoints-port> MYSCALE_USERNAME=<your-username> MYSCALE_PASSWORD=<your-password> ...`\n", | ||
"\n", | ||
" You can easily find your account, password and other info on our SaaS. For details please refer to [this document](https://docs.myscale.com/en/cluster-management/)\n", | ||
"\n", | ||
" Every attributes under `MyScaleSettings` can be set with prefix `MYSCALE_` and is case insensitive.\n", | ||
"\n", | ||
"2. Create `MyScaleSettings` object with parameters\n", | ||
"\n", | ||
"\n", | ||
" ```python\n", | ||
" from langchain.vectorstores import MyScale, MyScaleSettings\n", | ||
" config = MyScaleSetting(host=\"<your-backend-url>\", port=8443, ...)\n", | ||
" index = MyScale(embedding_function, config)\n", | ||
" index.add_documents(...)\n", | ||
" ```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "a3c3999a", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.document_loaders import TextLoader\n", | ||
"loader = TextLoader('../../../state_of_the_union.txt')\n", | ||
"documents = loader.load()\n", | ||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", | ||
"docs = text_splitter.split_documents(documents)\n", | ||
"\n", | ||
"embeddings = OpenAIEmbeddings()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"id": "6e104aee", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"Inserting data...: 100%|██████████| 42/42 [00:18<00:00, 2.21it/s]\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"for d in docs:\n", | ||
" d.metadata = {'some': 'metadata'}\n", | ||
"docsearch = MyScale.from_documents(docs, embeddings)\n", | ||
"\n", | ||
"query = \"What did the president say about Ketanji Brown Jackson\"\n", | ||
"docs = docsearch.similarity_search(query)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"id": "9c608226", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"As Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit. \n", | ||
"\n", | ||
"It’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children. \n", | ||
"\n", | ||
"And let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care. \n", | ||
"\n", | ||
"Third, support our veterans. \n", | ||
"\n", | ||
"Veterans are the best of us. \n", | ||
"\n", | ||
"I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home. \n", | ||
"\n", | ||
"My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free. \n", | ||
"\n", | ||
"Our troops in Iraq and Afghanistan faced many dangers.\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print(docs[0].page_content)" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "e3a8b105", | ||
"metadata": {}, | ||
"source": [ | ||
"## Get connection info and data schema" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "69996818", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"print(str(docsearch))" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "f59360c0", | ||
"metadata": {}, | ||
"source": [ | ||
"## Filtering\n", | ||
"\n", | ||
"You can have direct access to myscale SQL where statement. You can write `WHERE` clause following standard SQL.\n", | ||
"\n", | ||
"**NOTE**: Please be aware of SQL injection, this interface must not be directly called by end-user.\n", | ||
"\n", | ||
"If you custimized your `column_map` under your setting, you search with filter like this:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"id": "232055f6", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"Inserting data...: 100%|██████████| 42/42 [00:15<00:00, 2.69it/s]\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from langchain.vectorstores import MyScale, MyScaleSettings\n", | ||
"from langchain.document_loaders import TextLoader\n", | ||
"\n", | ||
"loader = TextLoader('../../../state_of_the_union.txt')\n", | ||
"documents = loader.load()\n", | ||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", | ||
"docs = text_splitter.split_documents(documents)\n", | ||
"\n", | ||
"embeddings = OpenAIEmbeddings()\n", | ||
"\n", | ||
"for i, d in enumerate(docs):\n", | ||
" d.metadata = {'doc_id': i}\n", | ||
"\n", | ||
"docsearch = MyScale.from_documents(docs, embeddings)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 16, | ||
"id": "ddbcee77", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"0.252379834651947 {'doc_id': 6, 'some': ''} And I’m taking robus...\n", | ||
"0.25022566318511963 {'doc_id': 1, 'some': ''} Groups of citizens b...\n", | ||
"0.2469480037689209 {'doc_id': 8, 'some': ''} And so many families...\n", | ||
"0.2428302764892578 {'doc_id': 0, 'some': 'metadata'} As Frances Haugen, w...\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"meta = docsearch.metadata_column\n", | ||
"output = docsearch.similarity_search_with_relevance_scores('What did the president say about Ketanji Brown Jackson?', \n", | ||
" k=4, where_str=f\"{meta}.doc_id<10\")\n", | ||
"for d, dist in output:\n", | ||
" print(dist, d.metadata, d.page_content[:20] + '...')" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "a359ed74", | ||
"metadata": {}, | ||
"source": [ | ||
"## Deleting your data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "fb6a9d36", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"docsearch.drop()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "48dbd8e0", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.8" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.