diff --git a/notebooks/astradb_haystack_integration.ipynb b/notebooks/astradb_haystack_integration.ipynb index 4d78f64..1cf71a8 100644 --- a/notebooks/astradb_haystack_integration.ipynb +++ b/notebooks/astradb_haystack_integration.ipynb @@ -1,1028 +1,1028 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "qFBDARhK0dn5" - }, - "source": [ - "## Introduction\n", - "\n", - "In this notebook, you'll learn how to use [AstraDB](https://docs.datastax.com/en/astra-serverless/docs/) as a data source in your Haystack pipelines." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "qFBDARhK0dn5" + }, + "source": [ + "## Introduction\n", + "\n", + "In this notebook, you'll learn how to use [AstraDB](https://docs.datastax.com/en/astra-serverless/docs/) as a data source in your Haystack pipelines." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JT4_BeDjZcCI" + }, + "source": [ + "# Prerequisites\n", + "\n", + "You'll need an [OpenAPI key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-api-key) to follow along. (Haystack is model-agnostic so feel free to use a different one if you'd prefer!)\n", + "\n", + "You'll need the following variables in order to use the Haystack extension. The following tutorials will show you how to create an AstraDB database, and save these pieces of information.\n", + "\n", + "- API Endpoint\n", + "- Token\n", + "- Astra keyspace\n", + "- Astra collection name\n", + "\n", + "Follow the first step in this [this tutorial to create a free AstraDB database](https://docs.datastax.com/en/astra-serverless/docs/manage/db/manage-create.html) and save your database ID, application token, keyspace, and database region.\n", + "\n", + "[Follow these steps to create a collection](https://docs.datastax.com/en/astra/astra-db-vector/databases/manage-collections.html). Save the name of your collection.\n", + "\n", + "Choose the number of dimensions that matches the [embedding model](https://haystack.deepset.ai/blog/what-is-text-vectorization-in-nlp) you plan on using. For this example we'll use a 384-dimension model, [`sentence-transformers/all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9qpgbYEB787y" + }, + "source": [ + "Next, install our dependencies." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, - { - "cell_type": "markdown", - "metadata": { - "id": "JT4_BeDjZcCI" - }, - "source": [ - "# Prerequisites\n", - "\n", - "You'll need an [OpenAPI key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-api-key) to follow along. (Haystack is model-agnostic so feel free to use a different one if you'd prefer!)\n", - "\n", - "You'll need the following variables in order to use the Haystack extension. The following tutorials will show you how to create an AstraDB database, and save these pieces of information.\n", - "\n", - "- API Endpoint\n", - "- Token\n", - "- Astra keyspace\n", - "- Astra collection name\n", - "\n", - "Follow the first step in this [this tutorial to create a free AstraDB database](https://docs.datastax.com/en/astra-serverless/docs/manage/db/manage-create.html) and save your database ID, application token, keyspace, and database region.\n", - "\n", - "[Follow these steps to create a collection](https://docs.datastax.com/en/astra/astra-db-vector/databases/manage-collections.html). Save the name of your collection.\n", - "\n", - "Choose the number of dimensions that matches the [embedding model](https://haystack.deepset.ai/blog/what-is-text-vectorization-in-nlp) you plan on using. For this example we'll use a 384-dimension model, [`sentence-transformers/all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)." - ] + "id": "BtIZl9jGm6QP", + "outputId": "2940fb89-b935-4dd1-b66f-c7a106cc06ec" + }, + "outputs": [], + "source": [ + "!pip install astra-haystack sentence-transformers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ipzOS2-O5G9b" + }, + "source": [ + "Here you'll enter your credentials and such. In production code, you'd want to use environment variables for sensitive credentials such as the application token to avoid committing those to source control.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, - { - "cell_type": "markdown", - "metadata": { - "id": "9qpgbYEB787y" - }, - "source": [ - "Next, install our dependencies." - ] + "id": "_EKlRTHucc2j", + "outputId": "402df0d2-a4cd-4e05-f3ab-51c376147e74" + }, + "outputs": [], + "source": [ + "from getpass import getpass\n", + "import os\n", + "\n", + "os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter your openAI key:\")\n", + "os.environ[\"ASTRA_DB_API_ENDPOINT\"] = getpass(\"Enter your Astra API Endpoint:\")\n", + "os.environ[\"ASTRA_DB_APPLICATION_TOKEN\"] = getpass(\"Enter your Astra application token (e.g.AstraCS:xxx ):\")\n", + "ASTRA_DB_COLLECTION_NAME = getpass(\"enter your Astra collection name:\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d3WEbADaAfzI" + }, + "source": [ + "Next we'll create a Haystack pipeline to create the embeddings and add them into the `AstraDocumentStore`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 85, + "referenced_widgets": [ + "d441033160e4421bb9af18d941d50d6b", + "fb58298ecdf543389e615ccdf7e6a37a", + "cc332ffcf5e5401ab26efef1b35fc34c", + "3aa85f749d99477da52d8c3693046e6a", + "ce61f00f6ff944d695002f7a8b599294", + "a9d1bb3132a54b52a3f2f16aec5b94a7", + "8208352647db4305bec1c2ab9499471b", + "1518245edc554442bd0ad6804273309c", + "dadf5298941e41ad93e19aea51e3008f", + "b4199cf8555a44cb80d22645750d0d32", + "9971ba728ebf44b2a2644c3930981ef4" + ] }, + "id": "A1iA_0dmXV3L", + "outputId": "0506858c-83f8-4bb5-c77f-bfc0b22a6589" + }, + "outputs": [ { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "BtIZl9jGm6QP", - "outputId": "2940fb89-b935-4dd1-b66f-c7a106cc06ec" + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d441033160e4421bb9af18d941d50d6b", + "version_major": 2, + "version_minor": 0 }, - "outputs": [], - "source": [ - "!pip install astra-haystack sentence-transformers" + "text/plain": [ + "Batches: 0%| | 0/1 [00:00