Skip to content

Commit

Permalink
Merge pull request #1 from jupyterjazz/docarray-vectorstore
Browse files Browse the repository at this point in the history
Docarray vectorstore
  • Loading branch information
jupyterjazz authored May 9, 2023
2 parents 65c95f9 + 8c0a611 commit a216ab4
Show file tree
Hide file tree
Showing 10 changed files with 1,381 additions and 199 deletions.
236 changes: 236 additions & 0 deletions docs/modules/indexes/vectorstores/examples/docarray_hnsw_search.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "2ce41f46-5711-4311-b04d-2fe233ac5b1b",
"metadata": {},
"source": [
"# DocArrayHnswSearch\n",
"\n",
">[DocArrayHnswSearch](https://docs.docarray.org/user_guide/storing/index_hnswlib/) is a lightweight Document Index implementation provided by [Docarray](https://docs.docarray.org/) that runs fully locally and is best suited for small- to medium-sized datasets. It stores vectors on disk in [hnswlib](https://github.com/nmslib/hnswlib), and stores all other data in [SQLite](https://www.sqlite.org/index.html).\n",
"\n",
"This notebook shows how to use functionality related to the `DocArrayHnswSearch`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8ce1b8cb-dbf0-40c3-99ee-04f28143331b",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!pip install \"docarray[hnswlib]\""
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "878f17df-100f-4854-9e87-472cf36d51f3",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"# get a token: https://platform.openai.com/account/api-keys\n",
"\n",
"from getpass import getpass\n",
"\n",
"OPENAI_API_KEY = getpass()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "82d9984a-6031-403d-a977-6bc98d6be23a",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "b757afef-ef0a-465d-8e8a-9aadb9c32b88",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/jinaai/Desktop/langchain/venv/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
}
],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import DocArrayHnswSearch\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "605e200e-e711-486b-b36e-cbe5dd2512d7",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"\n",
"db = DocArrayHnswSearch.from_documents(docs, embeddings, work_dir='hnswlib_store/', n_dim=1536)"
]
},
{
"cell_type": "markdown",
"id": "ed6f905b-4853-4a44-9730-614aa8e22b78",
"metadata": {},
"source": [
"## Similarity search"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4d7e742f-2002-449d-a10e-16046890906c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0da9e26f-1fc2-48e6-95a7-f692c853bbd3",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "3febb987-e903-416f-af26-6897d84c8d61",
"metadata": {},
"source": [
"## Similarity search with score"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "40764fdd-357d-475a-8152-5f1979d61a45",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"docs = db.similarity_search_with_score(query)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a479fc46-b299-4330-89b9-e9b5a218ea03",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={}),\n",
" 0.36962226)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "4d3d4e97-5d2b-4571-8ff9-e3f6b6778714",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import shutil\n",
"# delete the dir\n",
"shutil.rmtree('hnswlib_store')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit a216ab4

Please sign in to comment.