aws-samples · khanrubd · Feb 5, 2025 · Feb 19, 2025
diff --git a/..._custom_prompting_and_no_of_results.ipynb → ...d_retreiveandgenerate_and_streamapi.ipynb b/..._custom_prompting_and_no_of_results.ipynb → ...d_retreiveandgenerate_and_streamapi.ipynb
@@ -11,8 +11,10 @@
     "This module contains:\n",
     "1. [Overview](#1-Overview)\n",
     "2. [Pre-requisites](#2-Pre-requisites)\n",
-    "3. [How to leverage maximum number of results](#3-how-to-leverage-the-maximum-number-of-results-feature)\n",
-    "4. [How to use custom prompting](#4-how-to-use-the-custom-prompting-feature)"
+    "3. [Understanding RetrieveAndGenerate API](#understanding-retrieveandgenerate-api)\n",
+    "4. [Sreaming response using RetrieveAndGenerate API](#streaming-response-with-retrieveandgenerate-api)\n",
+    "5. [Adjust 'maximum number of results' retrieval parameter](#3-how-to-leverage-the-maximum-number-of-results-feature)\n",
+    "6. [How to use custom prompting](#4-how-to-use-the-custom-prompting-feature)"
    ]
   },
   {
@@ -107,6 +109,7 @@
     "import json\n",
     "import boto3\n",
     "import pprint\n",
+    "import sys\n",
     "from botocore.exceptions import ClientError\n",
     "from botocore.client import Config\n",
     "\n",
@@ -134,8 +137,8 @@
    },
    "outputs": [],
    "source": [
-    "%store -r kb_id\n",
-    "# kb_id = \"<<knowledge_base_id>>\" # Replace with your knowledge base id here."
+    "# %store -r kb_id\n",
+    "kb_id = \"<<knowledge_base_id>>\" # Replace with your knowledge base id here."
    ]
   },
   {
@@ -159,7 +162,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ca915234",
+   "id": "bf0243f5",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -174,9 +177,17 @@
     "$search_results$\n",
     "\n",
     "$output_format_instructions$\n",
-    "\"\"\"\n",
-    "\n",
-    "def retrieve_and_generate(query, kb_id, model_arn, max_results, prompt_template = default_prompt):\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ca915234",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def retrieve_and_generate(query, kb_id, model_arn, max_results=5, prompt_template = default_prompt):\n",
     "    response = bedrock_agent_client.retrieve_and_generate(\n",
     "            input={\n",
     "                'text': query\n",
@@ -202,24 +213,10 @@
     "    return response\n"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "a58b7808",
-   "metadata": {},
-   "source": [
-    "### How to leverage the maximum number of results feature\n",
-    "\n",
-    "In some use cases; the FM responses might be lacking enough context to provide relevant answers or relying that it couldn't find the requested info. Which could be fixed by modifying the maximum number of retrieved results.\n",
-    "\n",
-    "In the following example, we are going to run the following query with a few number of results (5):\n",
-    "\\\n",
-    "```Provide a list of risks for Octank financial in bulleted points.```\n"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e2918161",
+   "id": "ccd657e6",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -241,6 +238,104 @@
     "        pprint.pp(contexts)\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "5f1d6784",
+   "metadata": {},
+   "source": [
+    "### Test RetrieveAndGenerate API"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dbefffdd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query = \"\"\"Provide a list of risks for Octank financial in numbered list without description.\"\"\"\n",
+    "\n",
+    "results = retrieve_and_generate(query = query, kb_id = kb_id, model_arn = model_arn)\n",
+    "\n",
+    "print_generation_results(results)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f6d8439e",
+   "metadata": {},
+   "source": [
+    "### Streaming response with RetrieveAndGenerate API\n",
+    "\n",
+    "Using new [streaming API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_RetrieveAndGenerateStream.html) customers can use `retrieve_and_generate_stream` API from Amazon Bedrock Knowledge Bases to receive the response as it is being generated by the Foundation Model (FM), rather than waiting for the complete response. This will help customers to reduce the time to first token in case of latency sensitive applications."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "86a3a94a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def retrieve_and_generate_stream(query, kb_id, model_arn, max_results=5, prompt_template = default_prompt):\n",
+    "    response = bedrock_agent_client.retrieve_and_generate_stream(\n",
+    "            input={\n",
+    "                'text': query\n",
+    "            },\n",
+    "        retrieveAndGenerateConfiguration={\n",
+    "        'type': 'KNOWLEDGE_BASE',\n",
+    "        'knowledgeBaseConfiguration': {\n",
+    "            'knowledgeBaseId': kb_id,\n",
+    "            'modelArn': model_arn, \n",
+    "            'retrievalConfiguration': {\n",
+    "                'vectorSearchConfiguration': {\n",
+    "                    'numberOfResults': max_results # will fetch top N documents which closely match the query\n",
+    "                    }\n",
+    "                },\n",
+    "                'generationConfiguration': {\n",
+    "                        'promptTemplate': {\n",
+    "                            'textPromptTemplate': prompt_template\n",
+    "                        }\n",
+    "                    }\n",
+    "            }\n",
+    "        }\n",
+    "    )\n",
+    "\n",
+    "    for event in response['stream']:\n",
+    "        if 'output' in event:\n",
+    "            chunk = event['output']\n",
+    "            sys.stdout.write(chunk['text'])\n",
+    "            sys.stdout.flush()\n",
+    "\n",
+    "    \n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a55d95ce",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query = \"\"\"Provide a list of risks for Octank financial in numbered list without description.\"\"\"\n",
+    "\n",
+    "retrieve_and_generate_stream(query = query, kb_id = kb_id, model_arn = model_arn)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a58b7808",
+   "metadata": {},
+   "source": [
+    "### Adjust 'maximum number of results' retrieval parameter\n",
+    "\n",
+    "In some use cases; the FM responses might be lacking enough context to provide relevant answers or relying that it couldn't find the requested info. Which could be fixed by modifying the maximum number of retrieved results.\n",
+    "\n",
+    "In the following example, we are going to run the following query with a few number of results (3):\n",
+    "\\\n",
+    "```Provide a list of risks for Octank financial in bulleted points.```\n"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -990,9 +1085,9 @@
   ],
   "instance_type": "ml.t3.medium",
   "kernelspec": {
-   "display_name": "Python 3 (Data Science 3.0)",
+   "display_name": "Python 3",
    "language": "python",
-   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/sagemaker-data-science-310-v1"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
@@ -1004,7 +1099,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.10.13"
   }
  },
  "nbformat": 4,