diff --git a/docs/chat_message_history.ipynb b/docs/chat_message_history.ipynb index 23f055f2..06e488c1 100644 --- a/docs/chat_message_history.ipynb +++ b/docs/chat_message_history.ipynb @@ -4,11 +4,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Google AlloyDB\n", + "# Google AlloyDB for PostgreSQL\n", "\n", "> [AlloyDB](https://cloud.google.com/alloydb) is a fully managed PostgreSQL compatible database service for your most demanding enterprise workloads. AlloyDB combines the best of Google with PostgreSQL, for superior performance, scale, and availability. Extend your database application to build AI-powered experiences leveraging AlloyDB Langchain integrations.\n", "\n", - "This notebook goes over how to use `AlloyDB for PostgreSQL` to store chat message history with the `AlloyDBChatMessageHistory` class." + "This notebook goes over how to use `AlloyDB for PostgreSQL` to store chat message history with the `AlloyDBChatMessageHistory` class.\n", + "\n", + "Learn more about the package on [GitHub](https://github.com/googleapis/langchain-google-alloydb-pg-python/).\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-alloydb-pg-python/blob/main/samples/langchain_quick_start.ipynb)" ] }, { @@ -18,7 +22,9 @@ "## Before You Begin\n", "\n", "To run this notebook, you will need to do the following:\n", + "\n", " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", + " * [Enable the AlloyDB API](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com)\n", " * [Create a AlloyDB instance](https://cloud.google.com/alloydb/docs/instance-primary-create)\n", " * [Create a AlloyDB database](https://cloud.google.com/alloydb/docs/database-create)\n", " * [Add an IAM database user to the database](https://cloud.google.com/alloydb/docs/manage-iam-authn) (Optional)" @@ -111,24 +117,6 @@ "!gcloud config set project {PROJECT_ID}" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 💡 API Enablement\n", - "The `langchain-google-alloydb-pg` package requires that you [enable the AlloyDB Admin API](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com) in your Google Cloud Project." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# enable AlloyDB API\n", - "!gcloud services enable alloydb.googleapis.com" - ] - }, { "cell_type": "markdown", "metadata": {}, diff --git a/docs/document_loader.ipynb b/docs/document_loader.ipynb index 1c6a3b6a..cb7403ed 100644 --- a/docs/document_loader.ipynb +++ b/docs/document_loader.ipynb @@ -6,11 +6,15 @@ "id": "E_RJy7C1bpCT" }, "source": [ - "# AlloyDB for PostgreSQL\n", + "# Google AlloyDB for PostgreSQL\n", "\n", "> [AlloyDB](https://cloud.google.com/alloydb) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. AlloyDB is 100% compatible with PostgreSQL. Extend your database application to build AI-powered experiences leveraging AlloyDB's Langchain integrations.\n", "\n", - "This notebook goes over how to use `AlloyDB for PostgreSQL` to load Documents with the `AlloyDBLoader` class." + "This notebook goes over how to use `AlloyDB for PostgreSQL` to load Documents with the `AlloyDBLoader` class.\n", + "\n", + "Learn more about the package on [GitHub](https://github.com/googleapis/langchain-google-alloydb-pg-python/).\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-alloydb-pg-python/blob/main/samples/langchain_quick_start.ipynb)" ] }, { @@ -22,8 +26,9 @@ "## Before you begin\n", "\n", "To run this notebook, you will need to do the following:\n", + "\n", " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", - " * [Enable the AlloyDB Admin API.](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com)\n", + " * [Enable the AlloyDB API](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com)\n", " * [Create a AlloyDB cluster and instance.](https://cloud.google.com/alloydb/docs/cluster-create)\n", " * [Create a AlloyDB database.](https://cloud.google.com/alloydb/docs/quickstart/create-and-connect)\n", " * [Add a User to the database.](https://cloud.google.com/alloydb/docs/database-users/about)" @@ -138,30 +143,6 @@ "! gcloud config set project {PROJECT_ID}" ] }, - { - "cell_type": "markdown", - "id": "rEWWNoNnKOgq", - "metadata": { - "id": "rEWWNoNnKOgq" - }, - "source": [ - "### 💡 API Enablement\n", - "The `langchain-google-alloydb-pg` package requires that you [enable the AlloyDB Admin API](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com) in your Google Cloud Project." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "5utKIdq7KYi5", - "metadata": { - "id": "5utKIdq7KYi5" - }, - "outputs": [], - "source": [ - "# enable AlloyDB Admin API\n", - "!gcloud services enable alloydb.googleapis.com" - ] - }, { "cell_type": "markdown", "id": "f8f2830ee9ca1e01", diff --git a/docs/vector_store.ipynb b/docs/vector_store.ipynb index 060c5a53..5a9e2673 100644 --- a/docs/vector_store.ipynb +++ b/docs/vector_store.ipynb @@ -4,11 +4,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# AlloyDB for PostgreSQL\n", + "# Google AlloyDB for PostgreSQL\n", "\n", "> [AlloyDB](https://cloud.google.com/alloydb) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. AlloyDB is 100% compatible with PostgreSQL. Extend your database application to build AI-powered experiences leveraging AlloyDB's Langchain integrations.\n", "\n", - "This notebook goes over how to use `AlloyDB for PostgreSQL` to store vector embeddings with the `AlloyDBVectorStore` class." + "This notebook goes over how to use `AlloyDB for PostgreSQL` to store vector embeddings with the `AlloyDBVectorStore` class.\n", + "\n", + "Learn more about the package on [GitHub](https://github.com/googleapis/langchain-google-alloydb-pg-python/).\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-alloydb-pg-python/blob/main/samples/langchain_quick_start.ipynb)" ] }, { @@ -18,8 +22,9 @@ "## Before you begin\n", "\n", "To run this notebook, you will need to do the following:\n", + "\n", " * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n", - " * [Enable the AlloyDB Admin API.](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com)\n", + " * [Enable the AlloyDB API](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com)\n", " * [Create a AlloyDB cluster and instance.](https://cloud.google.com/alloydb/docs/cluster-create)\n", " * [Create a AlloyDB database.](https://cloud.google.com/alloydb/docs/quickstart/create-and-connect)\n", " * [Add a User to the database.](https://cloud.google.com/alloydb/docs/database-users/about)" @@ -139,30 +144,6 @@ "!gcloud config set project {PROJECT_ID}" ] }, - { - "cell_type": "markdown", - "id": "rEWWNoNnKOgq", - "metadata": { - "id": "rEWWNoNnKOgq" - }, - "source": [ - "### 💡 API Enablement\n", - "The `langchain-google-alloydb-pg` package requires that you [enable the AlloyDB Admin API](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com) in your Google Cloud Project." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "5utKIdq7KYi5", - "metadata": { - "id": "5utKIdq7KYi5" - }, - "outputs": [], - "source": [ - "# enable AlloyDB Admin API\n", - "!gcloud services enable alloydb.googleapis.com" - ] - }, { "cell_type": "markdown", "id": "f8f2830ee9ca1e01", diff --git a/samples/langchain_quick_start.ipynb b/samples/langchain_quick_start.ipynb index 1ae697b4..73fcd53e 100644 --- a/samples/langchain_quick_start.ipynb +++ b/samples/langchain_quick_start.ipynb @@ -1,18 +1,4 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } - }, "cells": [ { "cell_type": "code", @@ -46,7 +32,7 @@ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-alloydb-pg-python/blob/main/samples/langchain_quick_start.ipynb)\n", "\n", "---\n", - "# **Introduction**\n", + "# Introduction\n", "\n", "In this codelab, you'll learn how to create a powerful interactive generative AI application using Retrieval Augmented Generation powered by [AlloyDB for PostgreSQL](https://cloud.google.com/alloydb) and [LangChain](https://www.langchain.com/). We will be creating an application grounded in a [Netflix Movie dataset](https://www.kaggle.com/datasets/shivamb/netflix-shows), allowing you to interact with movie data in exciting new ways." ] @@ -85,6 +71,7 @@ }, "source": [ "## What you'll need\n", + "\n", "* A Google Cloud Account and Google Cloud Project\n", "* A web browser such as [Chrome](https://www.google.com/chrome/)" ] @@ -95,9 +82,10 @@ "id": "vHdR4fF3vLWA" }, "source": [ - "# **Setup and Requirements**\n", + "# Setup and Requirements\n", "\n", "In the following instructions you will learn to:\n", + "\n", "1. Install required dependencies for our application\n", "2. Set up authentication for our project\n", "3. Set up a AlloyDB for PostgreSQL Instance\n", @@ -122,22 +110,9 @@ }, "outputs": [], "source": [ - "%pip install langchain_google_alloydb_pg\n", - "\n", - "%pip install langchain langchain-google-vertexai" + "%pip install langchain-google-alloydb-pg langchain langchain-google-vertexai" ] }, - { - "cell_type": "code", - "source": [ - "%pip install \"google-cloud-alloydb-connector[pg8000]\"" - ], - "metadata": { - "id": "iXGq1naviLry" - }, - "execution_count": null, - "outputs": [] - }, { "cell_type": "markdown", "metadata": { @@ -150,16 +125,16 @@ }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "_Q9hyqdyEx6l" + }, + "outputs": [], "source": [ "from google.colab import auth\n", "\n", "auth.authenticate_user()" - ], - "metadata": { - "id": "_Q9hyqdyEx6l" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", @@ -167,12 +142,16 @@ "id": "UCiNGP1Qxd6x" }, "source": [ - "## Connect Your Google Cloud Project\n", - "Time to connect your Google Cloud Project to this notebook so that you can leverage Google Cloud from within Colab." + "## Connect Your Google Cloud Project" ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "SLUGlG6UE2CK" + }, + "outputs": [], "source": [ "# @markdown Please fill in the value below with your GCP project ID and then run the cell.\n", "\n", @@ -184,12 +163,7 @@ "\n", "# Configure gcloud.\n", "!gcloud config set project {project_id}" - ], - "metadata": { - "id": "SLUGlG6UE2CK" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", @@ -202,12 +176,12 @@ }, { "cell_type": "markdown", - "source": [ - "You will need to enable these APIs in order to create an AlloyDB database and utilize Vertex AI as an embeddings service!" - ], "metadata": { "id": "X-bzfFb4A-xK" - } + }, + "source": [ + "You will need to enable these APIs in order to create an AlloyDB database and utilize Vertex AI as an embeddings service!" + ] }, { "cell_type": "code", @@ -218,8 +192,7 @@ "outputs": [], "source": [ "# enable GCP services\n", - "!gcloud services enable alloydb.googleapis.com\n", - "!gcloud services enable aiplatform.googleapis.com" + "!gcloud services enable alloydb.googleapis.com aiplatform.googleapis.com" ] }, { @@ -229,26 +202,26 @@ }, "source": [ "## Set up AlloyDB\n", - "You will need a **Postgres** AlloyDB instance for the following stages of this notebook. Please set the following variables." + "You will need a Postgres AlloyDB instance for the following stages of this notebook. Please set the following variables." ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8q2lc-Po1mPv" + }, + "outputs": [], "source": [ "# @markdown Please fill in the both the Google Cloud region and name of your AlloyDB instance. Once filled in, run the cell.\n", "\n", "# Please fill in these values.\n", - "region = \"\" # @param {type:\"string\"}\n", - "instance_name = \"\" # @param {type:\"string\"}\n", - "database_name = \"\" # @param {type:\"string\"}\n", - "password = input(\"Please provide a password to be used for 'postgres' database user: \")\n", - "cluster_name = \"\" # @param {type:\"string\"}" - ], - "metadata": { - "id": "8q2lc-Po1mPv" - }, - "execution_count": null, - "outputs": [] + "region = \"us-central1\" # @param {type:\"string\"}\n", + "cluster_name = \"my-cluster\" # @param {type:\"string\"}\n", + "instance_name = \"my-primary\" # @param {type:\"string\"}\n", + "database_name = \"langchain\" # @param {type:\"string\"}\n", + "password = input(\"Please provide a password to be used for 'postgres' database user: \")" + ] }, { "cell_type": "markdown", @@ -262,13 +235,12 @@ }, { "cell_type": "markdown", - "source": [ - "First let's create an AlloyDB Cluster.\n", - "> ⏳ - Creating an AlloyDB cluster may take a few minutes.\n" - ], "metadata": { "id": "xyZYX4Jo1vfh" - } + }, + "source": [ + "> ⏳ - Creating an AlloyDB cluster may take a few minutes." + ] }, { "cell_type": "code", @@ -283,81 +255,73 @@ "assert instance_name, \"⚠️ Please provide the name of your instance\"\n", "assert database_name, \"⚠️ Please provide the name of your database_name\"\n", "\n", - "#create the AlloyDB Cluster\n", - "!gcloud beta alloydb clusters create {cluster_name} --password={password} --region={region}\n", - "\n" + "# create the AlloyDB Cluster\n", + "!gcloud beta alloydb clusters create {cluster_name} --password={password} --region={region}" ] }, { "cell_type": "markdown", - "source": [ - "Now that we have created our AlloyDB Cluster, we can create an instance attached to our cluster with the following command.\n", - "> ⏳ - Creating an AlloyDB instance may take a few minutes." - ], "metadata": { "id": "o8LkscYH5Vfp" - } + }, + "source": [ + "Create an instance attached to our cluster with the following command.\n", + "> ⏳ - Creating an AlloyDB instance may take a few minutes." + ] }, { "cell_type": "code", - "source": [ - "!gcloud beta alloydb instances create {instance_name} --instance-type=PRIMARY --cpu-count=2 --region={region} --cluster={cluster_name}" - ], + "execution_count": null, "metadata": { "id": "TkqQSWoY5Kab" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!gcloud beta alloydb instances create {instance_name} --instance-type=PRIMARY --cpu-count=2 --region={region} --cluster={cluster_name}" + ] }, { "cell_type": "markdown", - "source": [ - "In order to connect to your newly created AlloyDB instance from this notebook, you will need to enable public IP on your instance. Alternatively, you can follow [these instructions](https://cloud.google.com/alloydb/docs/connect-external) to connect to an AlloyDB for PostgreSQL instance with Private IP from outside your VPC." - ], "metadata": { "id": "BXsQ1UJv4ZVJ" - } + }, + "source": [ + "To connect to your AlloyDB instance from this notebook, you will need to enable public IP on your instance. Alternatively, you can follow [these instructions](https://cloud.google.com/alloydb/docs/connect-external) to connect to an AlloyDB for PostgreSQL instance with Private IP from outside your VPC." + ] }, { "cell_type": "code", - "source": [ - "!gcloud beta alloydb instances update {instance_name} --region={region} --cluster={cluster_name} --assign-inbound-public-ip=ASSIGN_IPV4" - ], + "execution_count": null, "metadata": { "id": "OPVWsQB04Yyl" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!gcloud beta alloydb instances update {instance_name} --region={region} --cluster={cluster_name} --assign-inbound-public-ip=ASSIGN_IPV4" + ] }, { "cell_type": "markdown", - "source": [ - "Now let's set the connection string that we will use to connect to our instance." - ], "metadata": { "id": "mjA6AiAzB2Du" - } + }, + "source": [ + "Now create a connection pool to connect to our instance." + ] }, { "cell_type": "code", - "source": [ - "connection_string = \"projects/{0}/locations/{1}/clusters/{2}/instances/{3}\".format(\n", - " project_id, region, cluster_name, instance_name\n", - ")\n", - "print(connection_string)" - ], + "execution_count": null, "metadata": { - "id": "acF-nMPGKJVQ" + "id": "zsLi0a7FIYjT" }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", + "outputs": [], "source": [ "from google.cloud.alloydb.connector import Connector, IPTypes\n", "import sqlalchemy\n", "\n", + "\n", + "connection_string = f\"projects/{project_id}/locations/{region}/clusters/{cluster_name}/instances/{instance_name}\"\n", "# initialize Connector object\n", "connector = Connector()\n", "\n", @@ -380,46 +344,31 @@ "pool = sqlalchemy.create_engine(\n", " \"postgresql+pg8000://\", creator=getconn, isolation_level=\"AUTOCOMMIT\"\n", ")" - ], - "metadata": { - "id": "zsLi0a7FIYjT" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "##Create a Database" - ], "metadata": { "id": "i_yNN1MnJpTR" - } - }, - { - "cell_type": "markdown", + }, "source": [ - "Now that we can connect to our AlloyDB instance from this notebook, let's create a database we will use to store the data for this application. You may get an error that there is no public ip address, this is because assigning a public ip address takes a few minutes. Please wait and retry this step if you hit an error!" - ], - "metadata": { - "id": "6xOPJoVsdOvZ" - } + "### Create a Database\n", + "\n", + "Next you will create database to store the data for this application using the connection pool. Enabling public IP takes a few minutes, you may get an error that there is no public IP address. Please wait and retry this step if you hit an error!" + ] }, { "cell_type": "code", - "source": [ - "create_db_cmd = sqlalchemy.text(\n", - " f\"CREATE DATABASE {database_name}\",\n", - ")\n", - "with pool.connect() as db_conn:\n", - " db_conn.execute(create_db_cmd)\n", - "connector.close()" - ], + "execution_count": null, "metadata": { "id": "hPE6tt5eJqhq" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "with pool.connect() as db_conn:\n", + " db_conn.execute(sqlalchemy.text(f\"CREATE DATABASE {database_name}\"))\n", + "connector.close()" + ] }, { "cell_type": "markdown", @@ -427,7 +376,7 @@ "id": "HdolCWyatZmG" }, "source": [ - "## Import data to your database\n", + "### Import data to your database\n", "\n", "Now that you have your database, you will need to import data! We will be using a [Netflix Dataset from Kaggle](https://www.kaggle.com/datasets/shivamb/netflix-shows). Here is what the data looks like:" ] @@ -453,106 +402,68 @@ "id": "kQ2KWsYI_Msa" }, "source": [ - "Instead of leaving it to you to figure out how to insert these documents, we have prepared code to help you insert the csv into your AlloyDB for PostgreSQL database." + "The following code has been prepared code to help insert the CSV data into your AlloyDB for PostgreSQL database." ] }, { "cell_type": "markdown", - "source": [ - "First lets download the csv file and add it to our notebook." - ], "metadata": { "id": "Dzr-2VZIkvtY" - } + }, + "source": [ + "Download the CSV file:" + ] }, { "cell_type": "code", - "source": [ - "!gsutil cp gs://cloud-samples-data/langchain/common/first_five_netflix_titles.csv ." - ], + "execution_count": null, "metadata": { "id": "5KkIQ2zSvQkN" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!gsutil cp gs://cloud-samples-data/langchain/common/first_five_netflix_titles.csv ." + ] }, { "cell_type": "markdown", - "source": [ - "We have downloaded the csv file to our home directory, to see it run ls. You should also be able to see the csv file populate in the \"Files\" tab." - ], "metadata": { "id": "oFU13dCBlYHh" - } - }, - { - "cell_type": "code", - "source": [ - "!ls" - ], - "metadata": { - "id": "nQBs10I8vShh" }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", "source": [ - "Lets add a connection function to connect to our db using the AlloyDB sychronous connector." - ], - "metadata": { - "id": "BotJ5x-8DBhQ" - } + "The download can be verified by the following command or using the \"Files\" tab." + ] }, { "cell_type": "code", - "source": [ - "from google.cloud.alloydb.connector import Connector, IPTypes\n", - "import sqlalchemy\n", - "\n", - "# initialize Connector object\n", - "connector = Connector()\n", - "\n", - "\n", - "# function to return the database connection\n", - "def getconn():\n", - " conn = connector.connect(\n", - " connection_string,\n", - " \"pg8000\",\n", - " user=\"postgres\",\n", - " password=password,\n", - " db=database_name,\n", - " enable_iam_auth=False,\n", - " ip_type=IPTypes.PUBLIC,\n", - " )\n", - " return conn\n", - "\n", - "\n", - "# create connection pool\n", - "pool = sqlalchemy.create_engine(\n", - " \"postgresql+pg8000://\", creator=getconn, isolation_level=\"AUTOCOMMIT\"\n", - ")" - ], + "execution_count": null, "metadata": { - "id": "n88BbOvDlXtx" + "id": "nQBs10I8vShh" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!ls" + ] }, { "cell_type": "markdown", - "source": [ - "In this next step we will\n", - "1. Create the table into which we will insert the data.\n", - "2. Map over the columns of our csv file to the columns of our datatable and insert the data!" - ], "metadata": { "id": "2H7rorG9Ivur" - } + }, + "source": [ + "In this next step you will:\n", + "\n", + "1. Create the table into store data\n", + "2. And insert the data from the CSV into the database table" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qCsM2KXbdYiv" + }, + "outputs": [], "source": [ "import pandas as pd\n", "\n", @@ -575,7 +486,7 @@ "\n", "netflix_data = \"/content/first_five_netflix_titles.csv\"\n", "\n", - "df = pd.read_csv(netflix_data)\n", + "df = pd.read_CSV(netflix_data)\n", "insert_data_cmd = sqlalchemy.text(\n", " \"\"\"\n", " INSERT INTO netflix_titles VALUES (:show_id, :type, :title, :director,\n", @@ -610,12 +521,7 @@ " )\n", " db_conn.commit()\n", "connector.close()" - ], - "metadata": { - "id": "qCsM2KXbdYiv" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", @@ -623,13 +529,13 @@ "id": "SsGS80H04bDN" }, "source": [ - "# **Use case 1: AlloyDB for Postgres as a document loader**\n", + "## Use case 1: AlloyDB for Postgres as a document loader\n", "\n", "---\n", "\n", "\n", "\n", - "Now that you have data in your database, you are ready to use AlloyDB for PostgreSQL as a document loader. This means we will pull data from the database and load it into memory as documents. We can then feed these documents into the vector store." + "Now that you have data in your database, you are ready to use AlloyDB for PostgreSQL as a document loader. This means you will pull data from the database and load it into memory as documents. These documents can be used to create a vector store." ] }, { @@ -638,7 +544,7 @@ "id": "-CQgPON8dwSK" }, "source": [ - "Next let's connect to our AlloyDB for PostgreSQL instance using the AlloyDBEngine class." + "First, create a connection to your AlloyDB for PostgreSQL instance using the `AlloyDBEngine` class." ] }, { @@ -668,7 +574,7 @@ "id": "8s-C0P-Oee69" }, "source": [ - "Once we initialize an AlloyDBEngine object, we can pass it into the AlloyDBSQLLoader to connect to a specific database. As you can see we also pass in a query, table_name and a list of columns. The query tells the loader what query to use to pull data. The \"content_columns\" argument refers to the columns that will be used as \"content\" in the document object we will later construct. The rest of the columns in that table will become the \"metadata\" associated with the documents." + "The `AlloyDBLoader` requires an `AlloyDBEngine` object to define the database connection and a `query` to define which data is to be retrieved. The `content_columns` argument can be used to define the columns that will be used as \"content\" in the document object we will later construct. The rest of the columns in that table will become the \"metadata\" associated with the documents." ] }, { @@ -688,38 +594,13 @@ ")" ] }, - { - "cell_type": "markdown", - "metadata": { - "id": "xvCEAp97fXRt" - }, - "source": [ - "Next let's define a function \"collect_async_items\" to asynchronously pull documents from our database." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "-p9UiHFbdyYa" - }, - "outputs": [], - "source": [ - "async def collect_async_items(docs_generator):\n", - " \"\"\"Collects items from an async generator.\"\"\"\n", - " docs = []\n", - " async for doc in docs_generator:\n", - " docs.append(doc)\n", - " return docs" - ] - }, { "cell_type": "markdown", "metadata": { "id": "dsL-KFrmfuS1" }, "source": [ - "Then let's run the function to pull our documents from out database using our document loader. You can see the first 5 documents from the database here. Nice, you just used AlloyDB Postgres as a document loader!" + "Use method `aload()` to pull documents from out database. You can see the first 5 documents from the database here." ] }, { @@ -730,19 +611,26 @@ }, "outputs": [], "source": [ - "documents = await collect_async_items(loader.alazy_load())\n", + "documents = await loader.aload()\n", "print(f\"Loaded {len(documents)} from the database. 5 Examples:\")\n", "for doc in documents[:5]:\n", " print(doc)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Nice, you just used AlloyDB for Postgres as a document loader!" + ] + }, { "cell_type": "markdown", "metadata": { "id": "z9uLV3bs4noo" }, "source": [ - "# **Use case 2: AlloyDB for PostgreSQL as Vector Store**" + "## Use case 2: AlloyDB for PostgreSQL as Vector Store" ] }, { @@ -751,7 +639,7 @@ "id": "duVsSeMcgEWl" }, "source": [ - "Now, let's learn how to put all of the documents we just loaded into a vector store so that we can use vector search to answer our user's questions!" + "Now, you will learn how to put all of the documents into a vector store so that you perform a vector search." ] }, { @@ -762,7 +650,7 @@ "source": [ "### Create Your Vector Store table\n", "\n", - "Based on the documents that we loaded before, we want to create a table with a vector column as our vector store. We will start it by intializing a vector table by calling the `init_vectorstore_table` function from our `engine`. As you can see we list all of the columns for our metadata. We also specify a vector size, 768, that corresponds with the length of the vectors computed by the model our embeddings service uses, Vertex AI's textembedding-gecko.\n" + "Create a vector store table that can preserve the Document's metadata by using the method `init_vectorstore_table` and defining specific metadata columns. The vector size is required. The example shows the vector size, `768`, that corresponds with the length of the vectors computed by the model our embeddings service uses, Vertex AI's `textembedding-gecko`. " ] }, { @@ -812,7 +700,7 @@ "source": [ "### Try inserting the documents into the vector table\n", "\n", - "Now we will create a vector_store object backed by our vector table in the AlloyDB database. Let's load the data from the documents to the vector table. Note that for each row, the embedding service will be called to compute the embeddings to store in the vector table. Pricing details can be found [here](https://cloud.google.com/vertex-ai/pricing)." + "Next, you will create a `AlloyDBVectorStore` object that connects to the new AlloyDB database table to store the data from the documents. Note that for each row, the embedding service will be called to compute the embeddings to store in the vector table. Pricing details can be found [here](https://cloud.google.com/vertex-ai/pricing)." ] }, { @@ -863,7 +751,7 @@ "id": "fr1rP6KQ-8ag" }, "source": [ - "Now let's try to put the documents data into the vector table. Here is a code example to load the first 5 documents in the list." + "Now, add the documents data into the vector table. Here is a code example to load the first 5 documents in the list." ] }, { @@ -893,34 +781,39 @@ "source": [ "### Import the rest of your data into your vector table\n", "\n", - "You don't have to call the embedding service 8,800 times to load all the documents for the demo. Instead, we have prepared a csv with the all 8,800+ rows with pre-computed embeddings in a csv file. Again, let's import the csv using gsutil." + "You don't have to call the embedding service 8,800 times to load all the documents for the demo. Instead, we have prepared a CSV with the all 8,800+ rows with pre-computed embeddings in a CSV file. You can import the CSV using `gsutil`." ] }, { "cell_type": "code", - "source": [ - "!gsutil cp gs://cloud-samples-data/langchain/alloydb/netflix_titles_embeddings.csv ." - ], + "execution_count": null, "metadata": { "id": "4dE4oQiNyWdC" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!gsutil cp gs://cloud-samples-data/langchain/alloydb/netflix_titles_embeddings.csv ." + ] }, { "cell_type": "markdown", - "source": [ - "And now let's insert the csv data into the table containing our vectors." - ], "metadata": { "id": "T1TXnKU_DznX" - } + }, + "source": [ + "Use the following code to insert the pregenerated embeddings into your vector store." + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PDIRYfUyyVyI" + }, + "outputs": [], "source": [ "netflix_data = \"/content/langchain_alloydb_netflix_computed_embeddings.csv\"\n", - "df = pd.read_csv(netflix_data)\n", + "df = pd.read_CSV(netflix_data)\n", "insert_data_cmd = sqlalchemy.text(\n", " \"\"\"\n", " INSERT INTO movie_vector_table_samples VALUES (:langchain_id, :content, :embedding, :show_id,\n", @@ -954,12 +847,7 @@ " )\n", " db_conn.commit()\n", "connector.close()" - ], - "metadata": { - "id": "PDIRYfUyyVyI" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", @@ -967,7 +855,7 @@ "id": "ZM_OFzZrQEPs" }, "source": [ - "# **Use case 3: AlloyDB for PostgreSQL as Chat Memory**" + "# Use case 3: AlloyDB for PostgreSQL as Chat Memory" ] }, { @@ -976,7 +864,7 @@ "id": "dxqIPQtjDquk" }, "source": [ - "Next we will add chat history (called “memory” in the context of LangChain) to our application so the LLM can retain context and information across multiple interactions, leading to more coherent and sophisticated conversations or text generation. We can use AlloyDB for PostgreSQL as “memory” storage in our application so that the LLM can use context from prior conversations to better answer the user’s prompts. First let's initialize AlloyDB for PostgreSQL as memory storage." + "Next you will add chat history (called “memory” in the context of LangChain) to our application so the LLM can retain context and information across multiple interactions, leading to more coherent and sophisticated conversations or text generation. You can use AlloyDB for PostgreSQL as “memory” storage in our application so that the LLM can use context from prior conversations to better answer the user’s prompts." ] }, { @@ -1038,7 +926,7 @@ "id": "k0O9mta8RQ0v" }, "source": [ - "# **Conversational RAG Chain backed by AlloyDB**" + "## Conversational RAG Chain backed by AlloyDB" ] }, { @@ -1047,11 +935,9 @@ "id": "j2OxF3JoNA7J" }, "source": [ - "So far we've tested out using AlloyDB for PostgreSQL as document loader, Vector Store and Chat Memory. Now let's use it in the `ConversationalRetrievalChain`.\n", - "\n", - "We will build a chat bot that can answer movie related questions based on the vector search results.\n", + "Try using all 3 integrations with the `ConversationalRetrievalChain`.\n", "\n", - "First let's initialize all of our AlloyDBEngine object to use as a connection in our vector store and chat_history." + "You will build a chatbot that can answer movie related questions based on the vector search results." ] }, { @@ -1118,7 +1004,7 @@ "id": "Ytlz9D3LmcU7" }, "source": [ - "Let's create a prompt for the LLM. Here we can add instructions specific to our application, such as \"Don't make things up\"." + "Create a prompt for the LLM. Here we can add instructions specific to our application, such as \"Don't make things up\"." ] }, { @@ -1163,7 +1049,7 @@ "id": "rsGe-bW5m0H1" }, "source": [ - "Now let's use our vector store as a retreiver. Retreiver's in Langchain allow us to literally \"retrieve\" documents." + "Next, create a retriever from the vector store in order to retrieve similar documents via a vector search." ] }, { @@ -1174,7 +1060,7 @@ }, "outputs": [], "source": [ - "# Intialize retriever, llm and memory for the chain\n", + "# Initialize retriever, llm and memory for the chain\n", "retriever = vector_store.as_retriever(\n", " search_type=\"mmr\", search_kwargs={\"k\": 5, \"lambda_mult\": 0.8}\n", ")" @@ -1186,7 +1072,7 @@ "id": "3maZ8SLlneYJ" }, "source": [ - "Now let's intialize our LLM, in this case we are using Vertex AI's \"gemini-pro\"." + "Next, initialize our LLM, in this case we are using Vertex AI's \"gemini-pro\"." ] }, { @@ -1206,7 +1092,7 @@ "id": "hN8mpXdtnocg" }, "source": [ - "We clear our chat history, so that our application starts without any prior context to other conversations we have had with the application." + "Clear your chat history, so that our application starts without any prior context to other conversations we have had with the application." ] }, { @@ -1234,7 +1120,7 @@ "id": "BDAT2koSn8Mz" }, "source": [ - "Now let's create a conversational retrieval chain. This will allow the LLM to use chat history in it's responses, meaning we can ask it follow up questions to our questions instead of having to start from scratch after each inquiry." + "Now, create a conversational retrieval chain. This will allow the LLM to use chat history in it's responses, meaning we can ask it follow up questions to our questions instead of having to start from scratch after each inquiry." ] }, { @@ -1272,5 +1158,19 @@ "chat_history.messages" ] } - ] + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 }