Skip to content

DOCSP-49240: Add LangChain semantic cache and Local RAG notebooks #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ai-integrations/langchain-graphrag.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"id": "b5dcbf95-9a30-416d-afed-d5b2bf0e8651",
"metadata": {},
"source": [
"# GraphRAG with MongoDB and LangChain\n",
"# LangChain MongoDB Integration - GraphRAG\n",
"\n",
"This notebook is a companion to the [GraphRAG with MongoDB and LangChain](https://www.mongodb.com/docs/atlas/atlas-vector-search/ai-integrations/langchain/graph-rag/) tutorial. Refer to the page for set-up instructions and detailed explanations.\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion ai-integrations/langchain-hybrid-search.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Atlas Vector Search - LangChain Integration - Hybrid Search"
"# LangChain MongoDB Integration - Hybrid Search"
]
},
{
Expand Down
251 changes: 251 additions & 0 deletions ai-integrations/langchain-local-rag.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# LangChain MongoDB Integration - Implement RAG Locally"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook is a companion to the [LangChain Local RAG](https://www.mongodb.com/docs/atlas/atlas-vector-search/ai-integrations/langchain/get-started/) tutorial. Refer to the page for set-up instructions and detailed explanations.\n",
"\n",
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/mongodb/docs-notebooks/blob/main/ai-integrations/langchain-local-rag.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"vscode": {
"languageId": "shellscript"
}
},
"source": [
"## Create a local Atlas deployment\n",
"\n",
"Run the following commands in your terminal to set up your local Atlas deployment. \n",
"\n",
"```\n",
"atlas deployments setup\n",
"curl https://atlas-education.s3.amazonaws.com/sampledata.archive -o sampledata.archive\n",
"mongorestore --archive=sampledata.archive --port=<port-number>\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"vscode": {
"languageId": "shellscript"
}
},
"source": [
"## Set up the environment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"pip install --quiet --upgrade pymongo langchain gpt4all sentence_transformers langchain-huggingface"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"MONGODB_URI = (\"mongodb://localhost:64983/?directConnection=true\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create embeddings with local model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pymongo import MongoClient\n",
"from sentence_transformers import SentenceTransformer\n",
"\n",
"# Connect to your local Atlas deployment or Atlas Cluster\n",
"client = MongoClient(MONGODB_URI)\n",
"\n",
"# Select the sample_airbnb.listingsAndReviews collection\n",
"collection = client[\"sample_airbnb\"][\"listingsAndReviews\"]\n",
"\n",
"# Load the embedding model (https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1)\n",
"model_path = \"<path-to-save-model>\"\n",
"model = SentenceTransformer('mixedbread-ai/mxbai-embed-large-v1')\n",
"model.save(model_path)\n",
"model = SentenceTransformer(model_path)\n",
"\n",
"# Define function to generate embeddings\n",
"def get_embedding(text):\n",
" return model.encode(text).tolist()\n",
"\n",
"# Filters for only documents with a summary field and without an embeddings field\n",
"filter = { '$and': [ { 'summary': { '$exists': True, \"$nin\": [ None, \"\" ] } }, { 'embeddings': { '$exists': False } } ] }\n",
"\n",
"# Creates embeddings for subset of the collection\n",
"updated_doc_count = 0\n",
"for document in collection.find(filter).limit(50):\n",
" text = document['summary']\n",
" embedding = get_embedding(text)\n",
" collection.update_one({ '_id': document['_id'] }, { \"$set\": { 'embeddings': embedding } }, upsert=True)\n",
" updated_doc_count += 1\n",
"\n",
"print(\"Documents updated: {}\".format(updated_doc_count))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure the vector store"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_huggingface import HuggingFaceEmbeddings\n",
"\n",
"embedding_model = HuggingFaceEmbeddings(model_name=\"mixedbread-ai/mxbai-embed-large-v1\")\n",
"\n",
"from langchain_mongodb import MongoDBAtlasVectorSearch\n",
"# Instantiate vector store\n",
"vector_store = MongoDBAtlasVectorSearch.from_connection_string(\n",
" connection_string = MONGODB_URI,\n",
" namespace = \"sample_airbnb.listingsAndReviews\",\n",
" embedding=embedding_model,\n",
" index_name=\"vector_index\",\n",
" embedding_key=\"embeddings\",\n",
" text_key=\"summary\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"vector_store.create_vector_search_index(\n",
" dimensions = 1024, # The dimensions of the vector embeddings to be indexed\n",
" wait_until_complete = 60 # Number of seconds to wait for the index to build (can take around a minute)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Implement RAG with a local LLM\n",
"Before running the following code, [download the local model](https://gpt4all.io/models/gguf/mistral-7b-openorca.gguf2.Q4_0.gguf)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
"from langchain_community.llms import GPT4All\n",
"\n",
"# Configure the LLM\n",
"local_path = \"<path-to-model>\"\n",
"\n",
"# Callbacks support token-wise streaming\n",
"callbacks = [StreamingStdOutCallbackHandler()]\n",
"\n",
"# Verbose is required to pass to the callback manager\n",
"llm = GPT4All(model=local_path, callbacks=callbacks, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.prompts import PromptTemplate\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"import pprint\n",
"\n",
"# Instantiate Atlas Vector Search as a retriever\n",
"retriever = vector_store.as_retriever()\n",
"\n",
"# Define prompt template\n",
"template = \"\"\"\n",
"Use the following pieces of context to answer the question at the end.\n",
"{context}\n",
"Question: {question}\n",
"\"\"\"\n",
"custom_rag_prompt = PromptTemplate.from_template(template)\n",
"\n",
"def format_docs(docs):\n",
" return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
"\n",
"# Create chain \n",
"rag_chain = (\n",
" {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
" | custom_rag_prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")\n",
"\n",
"# Prompt the chain\n",
"question = \"Can you recommend me a few AirBnBs that are beach houses?\"\n",
"answer = rag_chain.invoke(question)\n",
"\n",
"# Return source documents\n",
"documents = retriever.invoke(question)\n",
"print(\"\\nSource documents:\")\n",
"pprint.pprint(documents)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading