From c6cb7b8f17bb257ea43c21b7506bbaf49d0f9321 Mon Sep 17 00:00:00 2001 From: Ayman Farhat Date: Sat, 6 Sep 2025 17:43:36 +0200 Subject: [PATCH 1/5] Initial write up for the DuckDB cookbook --- examples/vector_databases/duckdb/.gitignore | 1 + examples/vector_databases/duckdb/README.md | 0 .../using-duckdb-with-openai-embeddings.ipynb | 912 ++++++++++++++++++ 3 files changed, 913 insertions(+) create mode 100644 examples/vector_databases/duckdb/.gitignore create mode 100644 examples/vector_databases/duckdb/README.md create mode 100644 examples/vector_databases/duckdb/using-duckdb-with-openai-embeddings.ipynb diff --git a/examples/vector_databases/duckdb/.gitignore b/examples/vector_databases/duckdb/.gitignore new file mode 100644 index 0000000000..f6b9cd9231 --- /dev/null +++ b/examples/vector_databases/duckdb/.gitignore @@ -0,0 +1 @@ +arxiv_data.db \ No newline at end of file diff --git a/examples/vector_databases/duckdb/README.md b/examples/vector_databases/duckdb/README.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/examples/vector_databases/duckdb/using-duckdb-with-openai-embeddings.ipynb b/examples/vector_databases/duckdb/using-duckdb-with-openai-embeddings.ipynb new file mode 100644 index 0000000000..dfd8728fb3 --- /dev/null +++ b/examples/vector_databases/duckdb/using-duckdb-with-openai-embeddings.ipynb @@ -0,0 +1,912 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0434d61f", + "metadata": {}, + "source": [ + "# Semantic Search using DuckDB SQL and OpenAI Embeddings\n" + ] + }, + { + "cell_type": "markdown", + "id": "785ee6d1", + "metadata": {}, + "source": [ + "DuckDB is an increasingly popular analytical database, known for its speed, simplicity, and ability to handle large-scale data analysis directly from your laptop or server. Its lightweight design and SQL compatibility make it a great choice for modern data science workflows.\n", + "\n", + "In this Cookbook, we will demonstrate integrating DuckDB with OpenAI APIs for performing semantic search on the Arxiv dataset, including loading data, generating embeddings, and running similarity queries using SQL.\n", + "\n", + "This notebook demonstrates how to:\n", + "\n", + "- Load the [arXiv](https://www.kaggle.com/datasets/spsayakpaul/arxiv-paper-abstracts) paper abstracts dataset into DuckDB\n", + "- Generate and store OpenAI embeddings into DuckDB\n", + "- Embed a search query with the OpenAI embeddings endpoint\n", + "- Perform semantic search in DuckDB using the embedded query" + ] + }, + { + "cell_type": "markdown", + "id": "aadf2202", + "metadata": {}, + "source": [ + "## Install dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "ad752660", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Collecting numpy\n", + " Using cached numpy-2.3.2-cp313-cp313-macosx_14_0_arm64.whl.metadata (62 kB)\n", + "Collecting kagglehub\n", + " Using cached kagglehub-0.3.13-py3-none-any.whl.metadata (38 kB)\n", + "Collecting duckdb\n", + " Using cached duckdb-1.3.2-cp313-cp313-macosx_12_0_arm64.whl.metadata (7.0 kB)\n", + "Collecting pandas\n", + " Using cached pandas-2.3.2-cp313-cp313-macosx_11_0_arm64.whl.metadata (91 kB)\n", + "Collecting openai\n", + " Using cached openai-1.106.1-py3-none-any.whl.metadata (29 kB)\n", + "Requirement already satisfied: packaging in /Users/ayman/.pyenv/versions/3.13.7/lib/python3.13/site-packages (from kagglehub) (25.0)\n", + "Collecting pyyaml (from kagglehub)\n", + " Using cached PyYAML-6.0.2-cp313-cp313-macosx_11_0_arm64.whl.metadata (2.1 kB)\n", + "Collecting requests (from kagglehub)\n", + " Using cached requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)\n", + "Collecting tqdm (from kagglehub)\n", + " Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)\n", + "Requirement already satisfied: python-dateutil>=2.8.2 in /Users/ayman/.pyenv/versions/3.13.7/lib/python3.13/site-packages (from pandas) (2.9.0.post0)\n", + "Collecting pytz>=2020.1 (from pandas)\n", + " Using cached pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)\n", + "Collecting tzdata>=2022.7 (from pandas)\n", + " Using cached tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)\n", + "Collecting anyio<5,>=3.5.0 (from openai)\n", + " Using cached anyio-4.10.0-py3-none-any.whl.metadata (4.0 kB)\n", + "Collecting distro<2,>=1.7.0 (from openai)\n", + " Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)\n", + "Collecting httpx<1,>=0.23.0 (from openai)\n", + " Using cached httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)\n", + "Collecting jiter<1,>=0.4.0 (from openai)\n", + " Using cached jiter-0.10.0-cp313-cp313-macosx_11_0_arm64.whl.metadata (5.2 kB)\n", + "Collecting pydantic<3,>=1.9.0 (from openai)\n", + " Using cached pydantic-2.11.7-py3-none-any.whl.metadata (67 kB)\n", + "Collecting sniffio (from openai)\n", + " Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)\n", + "Collecting typing-extensions<5,>=4.11 (from openai)\n", + " Using cached typing_extensions-4.15.0-py3-none-any.whl.metadata (3.3 kB)\n", + "Collecting idna>=2.8 (from anyio<5,>=3.5.0->openai)\n", + " Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)\n", + "Collecting certifi (from httpx<1,>=0.23.0->openai)\n", + " Using cached certifi-2025.8.3-py3-none-any.whl.metadata (2.4 kB)\n", + "Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)\n", + " Using cached httpcore-1.0.9-py3-none-any.whl.metadata (21 kB)\n", + "Collecting h11>=0.16 (from httpcore==1.*->httpx<1,>=0.23.0->openai)\n", + " Using cached h11-0.16.0-py3-none-any.whl.metadata (8.3 kB)\n", + "Collecting annotated-types>=0.6.0 (from pydantic<3,>=1.9.0->openai)\n", + " Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)\n", + "Collecting pydantic-core==2.33.2 (from pydantic<3,>=1.9.0->openai)\n", + " Using cached pydantic_core-2.33.2-cp313-cp313-macosx_11_0_arm64.whl.metadata (6.8 kB)\n", + "Collecting typing-inspection>=0.4.0 (from pydantic<3,>=1.9.0->openai)\n", + " Using cached typing_inspection-0.4.1-py3-none-any.whl.metadata (2.6 kB)\n", + "Requirement already satisfied: six>=1.5 in /Users/ayman/.pyenv/versions/3.13.7/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)\n", + "Collecting charset_normalizer<4,>=2 (from requests->kagglehub)\n", + " Using cached charset_normalizer-3.4.3-cp313-cp313-macosx_10_13_universal2.whl.metadata (36 kB)\n", + "Collecting urllib3<3,>=1.21.1 (from requests->kagglehub)\n", + " Using cached urllib3-2.5.0-py3-none-any.whl.metadata (6.5 kB)\n", + "Using cached numpy-2.3.2-cp313-cp313-macosx_14_0_arm64.whl (5.1 MB)\n", + "Using cached kagglehub-0.3.13-py3-none-any.whl (68 kB)\n", + "Using cached duckdb-1.3.2-cp313-cp313-macosx_12_0_arm64.whl (15.5 MB)\n", + "Using cached pandas-2.3.2-cp313-cp313-macosx_11_0_arm64.whl (10.7 MB)\n", + "Using cached openai-1.106.1-py3-none-any.whl (930 kB)\n", + "Using cached anyio-4.10.0-py3-none-any.whl (107 kB)\n", + "Using cached distro-1.9.0-py3-none-any.whl (20 kB)\n", + "Using cached httpx-0.28.1-py3-none-any.whl (73 kB)\n", + "Using cached httpcore-1.0.9-py3-none-any.whl (78 kB)\n", + "Using cached jiter-0.10.0-cp313-cp313-macosx_11_0_arm64.whl (318 kB)\n", + "Using cached pydantic-2.11.7-py3-none-any.whl (444 kB)\n", + "Using cached pydantic_core-2.33.2-cp313-cp313-macosx_11_0_arm64.whl (1.8 MB)\n", + "Using cached typing_extensions-4.15.0-py3-none-any.whl (44 kB)\n", + "Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB)\n", + "Using cached h11-0.16.0-py3-none-any.whl (37 kB)\n", + "Using cached idna-3.10-py3-none-any.whl (70 kB)\n", + "Using cached pytz-2025.2-py2.py3-none-any.whl (509 kB)\n", + "Using cached sniffio-1.3.1-py3-none-any.whl (10 kB)\n", + "Using cached tqdm-4.67.1-py3-none-any.whl (78 kB)\n", + "Using cached typing_inspection-0.4.1-py3-none-any.whl (14 kB)\n", + "Using cached tzdata-2025.2-py2.py3-none-any.whl (347 kB)\n", + "Using cached certifi-2025.8.3-py3-none-any.whl (161 kB)\n", + "Using cached PyYAML-6.0.2-cp313-cp313-macosx_11_0_arm64.whl (171 kB)\n", + "Using cached requests-2.32.5-py3-none-any.whl (64 kB)\n", + "Using cached charset_normalizer-3.4.3-cp313-cp313-macosx_10_13_universal2.whl (205 kB)\n", + "Using cached urllib3-2.5.0-py3-none-any.whl (129 kB)\n", + "Installing collected packages: pytz, urllib3, tzdata, typing-extensions, tqdm, sniffio, pyyaml, numpy, jiter, idna, h11, duckdb, distro, charset_normalizer, certifi, annotated-types, typing-inspection, requests, pydantic-core, pandas, httpcore, anyio, pydantic, kagglehub, httpx, openai\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m26/26\u001b[0m [openai]25/26\u001b[0m [openai]c]core]\n", + "\u001b[1A\u001b[2KSuccessfully installed annotated-types-0.7.0 anyio-4.10.0 certifi-2025.8.3 charset_normalizer-3.4.3 distro-1.9.0 duckdb-1.3.2 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 idna-3.10 jiter-0.10.0 kagglehub-0.3.13 numpy-2.3.2 openai-1.106.1 pandas-2.3.2 pydantic-2.11.7 pydantic-core-2.33.2 pytz-2025.2 pyyaml-6.0.2 requests-2.32.5 sniffio-1.3.1 tqdm-4.67.1 typing-extensions-4.15.0 typing-inspection-0.4.1 tzdata-2025.2 urllib3-2.5.0\n" + ] + } + ], + "source": [ + "!pip install numpy kagglehub duckdb pandas openai" + ] + }, + { + "cell_type": "markdown", + "id": "3058bd5a", + "metadata": {}, + "source": [ + "## Extract the dataset and load into DuckDB\n", + "In this example, we'll be using the arXiv paper abstracts from kaggle as an example. Its a simple CSV with titles and summaries. Let's extract it." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "6ae41715", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/ayman/.pyenv/versions/3.13.7/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/Users/ayman/.cache/kagglehub/datasets/spsayakpaul/arxiv-paper-abstracts/versions/2/arxiv_data.csv\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titlessummariesterms
0Survey on Semantic Stereo Matching / Semantic ...Stereo matching is one of the widely used tech...['cs.CV', 'cs.LG']
1FUTURE-AI: Guiding Principles and Consensus Re...The recent advancements in artificial intellig...['cs.CV', 'cs.AI', 'cs.LG']
2Enforcing Mutual Consistency of Hard Regions f...In this paper, we proposed a novel mutual cons...['cs.CV', 'cs.AI']
3Parameter Decoupling Strategy for Semi-supervi...Consistency training has proven to be an advan...['cs.CV']
4Background-Foreground Segmentation for Interio...To ensure safety in automated driving, the cor...['cs.CV', 'cs.LG']
\n", + "
" + ], + "text/plain": [ + " titles \\\n", + "0 Survey on Semantic Stereo Matching / Semantic ... \n", + "1 FUTURE-AI: Guiding Principles and Consensus Re... \n", + "2 Enforcing Mutual Consistency of Hard Regions f... \n", + "3 Parameter Decoupling Strategy for Semi-supervi... \n", + "4 Background-Foreground Segmentation for Interio... \n", + "\n", + " summaries \\\n", + "0 Stereo matching is one of the widely used tech... \n", + "1 The recent advancements in artificial intellig... \n", + "2 In this paper, we proposed a novel mutual cons... \n", + "3 Consistency training has proven to be an advan... \n", + "4 To ensure safety in automated driving, the cor... \n", + "\n", + " terms \n", + "0 ['cs.CV', 'cs.LG'] \n", + "1 ['cs.CV', 'cs.AI', 'cs.LG'] \n", + "2 ['cs.CV', 'cs.AI'] \n", + "3 ['cs.CV'] \n", + "4 ['cs.CV', 'cs.LG'] " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import kagglehub\n", + "import pandas as pd\n", + "\n", + "path = kagglehub.dataset_download(\"spsayakpaul/arxiv-paper-abstracts\")\n", + "\n", + "path = path+\"/arxiv_data.csv\"\n", + "print(path)\n", + "\n", + "# Load the dataset into DuckDB\n", + "import duckdb\n", + "\n", + "# Create a connection to the database\n", + "conn = duckdb.connect('arxiv_data.db')\n", + "\n", + "# Load the dataset into DuckDB, limiting to 400 rows for testing\n", + "duckdb.sql(f\"\"\"\n", + " CREATE OR REPLACE TABLE papers AS \n", + " SELECT * FROM read_csv('{path}', header=true, parallel=false)\n", + " LIMIT 400\n", + "\"\"\")\n", + "\n", + "# Inspect the first 5 rows of the dataset\n", + "result = duckdb.sql(\"SELECT * FROM papers LIMIT 5\").df()\n", + "\n", + "result.head()\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "32a2a22e", + "metadata": {}, + "source": [ + "### Add an embeddings column to the schema" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5da323b3", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "┌───────┬────────────┬─────────────┬─────────┬────────────┬─────────┐\n", + "│ cid │ name │ type │ notnull │ dflt_value │ pk │\n", + "│ int32 │ varchar │ varchar │ boolean │ varchar │ boolean │\n", + "├───────┼────────────┼─────────────┼─────────┼────────────┼─────────┤\n", + "│ 0 │ titles │ VARCHAR │ false │ NULL │ false │\n", + "│ 1 │ summaries │ VARCHAR │ false │ NULL │ false │\n", + "│ 2 │ terms │ VARCHAR │ false │ NULL │ false │\n", + "│ 3 │ embeddings │ FLOAT[1024] │ false │ NULL │ false │\n", + "└───────┴────────────┴─────────────┴─────────┴────────────┴─────────┘" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "duckdb.sql(\"ALTER TABLE papers ADD COLUMN IF NOT EXISTS embeddings FLOAT[1024]\")\n", + "\n", + "# Verify the new column has been added by inspecting the schema\n", + "duckdb.sql(\"PRAGMA table_info(papers)\")" + ] + }, + { + "cell_type": "markdown", + "id": "d366692b", + "metadata": {}, + "source": [ + "## Generate embeddings for the dataset" + ] + }, + { + "cell_type": "markdown", + "id": "1f361c07", + "metadata": {}, + "source": [ + "There are multiple options for creating embeddings in DuckDB. We could either\n", + "\n", + "1. Loop through batches of inputs in Python, call the embedding model and store each batch in the database.\n", + "\n", + "2. Create a custom DuckDB function (UDF) to call the model and write the embeddings in a single SQL statement.\n", + "\n", + "In this notebook, I'll go with option 2, in order to have an \"SQL first\" experience, defining a re-usable SQL embedding function that I could use in different use cases." + ] + }, + { + "cell_type": "markdown", + "id": "5763a058", + "metadata": {}, + "source": [ + "### Defining an OpenAI embeddings UDF for DuckDB\n", + "\n", + "The function below specifies the encoding format as \"float\" and sets the embedding dimensions to 1024 which is compatible with the embeddings field size on DuckDB." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bf1dcf16", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "from duckdb.typing import VARCHAR\n", + "import openai\n", + "client = openai.OpenAI()\n", + "\n", + "# Define the UDF for embedding a text input using the OpenAI API.\n", + "def embed_openai(text: str) -> np.ndarray:\n", + " \"\"\"\n", + " DuckDB UDF for embedding a text input using the OpenAI API.\n", + " \"\"\"\n", + " model = \"text-embedding-3-small\"\n", + " response = client.embeddings.create(\n", + " model=model,\n", + " input=text,\n", + " encoding_format=\"float\",\n", + " dimensions=1024\n", + " )\n", + "\n", + " return response.data[0].embedding\n", + "\n", + "# Register the UDF with DuckDB.\n", + "duckdb.create_function(\"embed_openai\", embed_openai, [VARCHAR], \"FLOAT[1024]\")" + ] + }, + { + "cell_type": "markdown", + "id": "222d1038", + "metadata": {}, + "source": [ + "*Note on performance:* The above function, will run a call to OpenAI's embeddings API for every single row. Depending on your dataset size, this might be quite slow. For larger datasets, consider [upgrading this function](https://lukaszrogalski.substack.com/p/python-udfs-in-duckdb) to work with aggregated data and pass in multiple sentences (batches) to the OpenAI embeddings call." + ] + }, + { + "cell_type": "markdown", + "id": "f0b2d7c9", + "metadata": {}, + "source": [ + "Now that we’ve registered the function with DuckDB, we can use in like any native function as part of our SQL query:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "0619fbf5", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐\n", + "│ query_embedding │\n", + "│ float[1024] │\n", + "├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤\n", + "│ [-0.011518722, -0.010231336, 0.043689836, 0.01666827, 0.008659369, -0.012934848, -0.006480975, 0.066835694, -0.03431224, -0.043771144, 0.03317392, -0.010658206, -0.06461326, 0.033987008, -0.00097570353, 0.031141205, 0.0007499874, 0.031981394, 0.09415539, 0.047077697, -0.029352415, -0.039136555, 0.042930957, -0.027428111, -0.03845898, -0.04284965, 0.01622107, -0.0019124467, 0.054151546, -0.024717823, 0.041521605, -0.031683262, -0.029921575, -0.025517358, -0.07854413, 0.060331002, -0.049354337, 0.015841631, -0.05024873, 0.032252423, -0.0164921, -0.020001922, -0.024419691, 0.019852856, -0.040871136, -0.039136555, 0.007846283, -0.07561702, -0.03404121, 0.025002403, 0.044340305, 0.02393184, -0.008442546, -0.018158928, -0.015814528, -0.01639724, -5.9816895e-05, -0.013293961, 0.040464595, -0.018836498, 0.005932142, -0.008835537, -0.045885168, -0.006609714, -0.03656178, 0.00617268, -0.013354942, 0.012494426, 0.010380401, 0.050438453, -0.01780659, 0.06857028, 0.07512917, 0.044123482, 0.037998233, -0.017481355, -0.051278643, 0.018457059, 0.02271221, 0.04664405, -0.013598868, 0.0032354058, 0.055073045, -0.03268607, -0.05946371, -0.0023664199, 0.014581348, 0.016356586, -0.093342304, -0.041521605, 0.021885572, 0.024392588, -0.005854221, -0.0032201605, 0.04173843, -0.011105403, 0.0801161, 0.028891666, -0.025300534, 0.012054004, -0.07951984, -0.060547825, -0.002286805, -0.015421537, 0.10602645, 0.004692185, -0.028512226, -0.015502845, -0.021844918, 0.02063884, -0.031114101, -0.0040349406, -0.07025065, 0.022658005, -0.030951485, 0.012230173, -0.02902718, -0.063908584, -0.0046447553, 0.035206635, -0.06152353, -0.00090540544, 0.025612218, 0.047430035, 0.0090930145, -0.001903977, -0.00052003644, -0.10109373, 0.016478548, 0.01941921, 0.037727203, 0.025951004, -0.005423963, -0.006704574, 0.019459866, -0.050059013, 0.023118753, 0.02760428, 0.007643011, -0.023023894, 0.07588805, -0.0036046826, 0.04165712, 0.055777717, 0.0085645085, 0.028376712, 0.021384168, -0.044909466, 0.025043057, -0.024392588, -0.040247772, -0.051495466, 0.030002885, 0.034176726, 0.00084654137, -0.015692566, 0.018443508, 0.04157581, -0.009655399, -0.062607646, -0.010061943, 0.026533715, -0.028376712, -0.015706116, 0.06071044, -0.032902893, -0.009777362, -0.0037808511, -0.04390666, -0.009614745, 0.018050516, -0.0068468642, 0.018118273, -0.0019954492, 0.050330043, -0.030138398, -0.020015474, -0.037347764, -0.021790713, -0.017481355, 0.02318651, 0.021844918, 0.003835057, -0.029271107, 0.0017396659, -0.008239274, 0.03469168, 0.010400728, -0.016641166, 0.010861478, 0.011491619, 0.044963673, 0.020950524, 0.036995426, 0.029542135, -0.020056129, 0.007615908, -0.0036283976, 0.011762648, -0.0894937, 0.019093975, 0.010583674, 0.0027102877, 0.05626557, 0.0014313706, -0.06916654, 0.06461326, -0.0442861, -0.037862718, -0.036101032, -0.02290193, 0.059138477, 0.03883842, 0.040383287, -0.0038655477, -0.05718707, -0.0135175595, -0.028566431, -0.01894491, 0.008842314, 0.029569238, 0.005955857, 0.035369255, 0.065751575, 0.02044912, 0.054449677, -0.029433724, -0.0047599426, 0.03149354, -0.039055243, -0.00096299907, 0.020598186, -0.037456173, 0.01714257, -0.016126212, 0.01876874, 0.020503325, -0.011450965, -0.07073851, -0.101635784, 0.0032963874, -0.0013966451, -0.022197256, 0.05379921, 0.015367331, -0.0021546786, -0.031141205, -0.0030372161, -0.007324552, -0.061035678, -0.015665462, 0.041142166, 0.0018853438, 0.013382046, 0.005911815, -0.0035606404, 0.002673021, -0.005959245, -0.018213132, 0.03138513, -0.030734662, -0.03753748, 0.013287185, -0.02591035, -0.013111017, 0.06531793, 0.0045905495, 0.02959634, -0.069003925, -0.011240918, 0.01997482, 0.024392588, 0.02317296, -0.036697295, -0.05680763, -0.036317855, -0.03902814, -0.017711729, -0.01782014, 0.04062721, 0.020340709, 0.05201042, 0.02902718, 0.03873001, 0.011105403, 0.061794557, 0.0017337371, 0.023823429, -0.022861276, -0.0034827197, 0.074858144, 0.0069518876, 0.046318814, -0.03363467, -0.002916947, -0.040952444, 0.016044904, -0.021275757, -0.019608932, -0.0059761843, -0.000892701, -0.019093975, 0.009675727, -0.018023413, -0.045559935, 0.015069199, 0.045614142, -0.006027002, 0.0070535233, -0.011993023, -0.026113622, -0.040925343, 0.030544942, 0.030111296, -0.08396471, -0.09946755, -0.031032793, 0.044367407, 0.002437565, -0.064125404, 0.066293634, -0.011525498, 0.022508938, 0.030273912, -0.022536041, 0.055723514, -0.021506133, 0.018606124, 0.011234142, -0.025503807, 0.053365562, 0.028349608, -0.0048819054, 0.023349129, -0.04881228, -0.015069199, -0.0035775797, 0.084560975, -0.07176842, -0.038919732, -0.064179614, 0.026831847, -0.017589767, 0.025110815, -0.006247213, 0.031710364, -0.0011950674, 0.035206635, 0.056102954, 0.026696334, -0.046318814, -0.020842113, -0.014174804, -0.047457136, 0.06851607, -0.011146057, -0.018213132, -0.034800094, -0.015299574, -0.0072161406, -0.019757997, 0.016654717, 0.032171115, 0.041060857, 0.0706843, -0.02902718, -0.036426265, -0.030788867, 0.022915483, 0.007920816, -0.021004729, 0.052281447, 0.024446795, -0.03157485, -0.029487928, -0.012819661, -0.02290193, -0.0009477537, 0.026438856, 0.01432387, -0.0006932407, 0.015977146, -0.010136476, -0.0049530505, 0.0077988524, -0.0040213894, -0.008124087, -0.032957096, -0.03534215, 0.012372463, -0.006799434, -0.01922949, -0.023715017, 0.069003925, -0.022834172, -0.0037571362, 0.03618234, -0.006287867, 0.005647562, 0.018050516, 0.04257862, -0.014445833, 0.014879479, -0.021086037, -0.0036792154, 0.011566153, 0.057891745, 0.031358026, 0.024351934, -0.018904256, 0.028186992, 0.0064708116, 0.009357268, -0.027116427, -0.04645433, -0.04634592, 0.016234623, 0.042822544, 0.026357546, 0.035586078, 0.079194605, -0.0020242461, -0.05425996, 0.01310424, -0.008998155, -0.023322025, 0.019812202, -0.0858077, 0.017074812, -0.0895479, -0.0268454, -0.004438096, -0.003206609, 0.04702349, -0.0145542445, -0.0052173035, -0.019365005, 0.04127768, -0.015773874, -0.011315451, -0.034176726, 0.00905236, 0.009723157, 0.016993504, -0.016817335, 0.010861478, 0.0138495695, -0.035775796, 0.043120675, -0.053880516, 0.06190297, -0.04550573, 0.045641243, -0.05247117, -0.031330924, -0.020313606, 0.057891745, -0.025679976, 0.010658206, -0.034176726, 0.009045585, 0.015136956, -0.030084193, 0.0282683, 0.03818795, 0.018240236, -0.0013644604, 0.053392667, -0.021072486, 0.011545825, 0.01413415, -0.019107528, 0.013524335, -0.023498194, 0.013998635, -0.007670114, 0.009533437, 0.014025738, 0.030111296, -0.01809117, 0.0067079617, -0.017657524, 0.004427932, 0.06754037, -0.017738832, 0.03032812, -0.051116023, 0.027915962, 0.038296364, 0.041494504, -0.02459586, -0.023633707, 0.030870177, -0.012779006, 0.06136091, 0.06190297, -0.012386015, -0.033499155, 0.0106988605, -0.009404698, 0.009709605, 0.017318739, 0.036914118, 0.0254496, 0.02714353, 0.011837181, -0.026777642, 0.046969283, 0.042416003, -0.0028864562, 0.07128056, -0.042930957, -0.010089045, -0.030707559, -0.021804264, 0.019148182, 0.017671075, -0.0045905495, 0.006098147, -0.011803303, 0.020367812, -0.041060857, -0.002750942, -0.01582808, -0.010861478, -0.0058101793, -0.038377672, -0.08087498, 0.0025493642, -0.06857028, 0.031520646, -0.033797286, -0.026249135, -0.03753748, -0.0122640515, -0.021031832, 0.010807272, 0.012623165, 0.017860796, -0.049977705, -0.013585317, -0.021736506, 0.030165501, -0.0012179355, -0.027753346, 0.032902893, 0.03428514, 0.0069654393, -0.012785782, 0.03723935, 0.014865927, 0.011085076, 0.015380883, 0.036724396, 0.026249135, 0.0029880921, 0.017034158, -0.019744445, -0.0009325083, -0.026547268, 0.007832731, 0.0014610144, -0.0052410187, 0.011525498, 0.033688877, 0.013998635, -0.028810358, -0.029379517, 0.0049598264, 0.018606124, 0.0075142724, 0.0038519963, -0.012731576, 0.006243825, 0.01809117, 0.012663819, -0.015773874, -0.019080425, 0.0029830104, 0.0059423055, 0.009032033, -0.04474685, -0.0051495465, -0.023145856, -0.028430916, 0.022400526, 0.029731855, 0.014649104, 0.040003844, 0.035830002, -0.018633228, 0.008273153, -0.018321544, -0.037835617, -0.0064979144, 0.0071483837, -0.011003768, 0.02383698, -0.022115948, 0.004143352, 0.009675727, 0.047348723, -0.030463632, -0.016383689, -0.008550958, -0.014743964, -0.0050716256, -0.023593053, 0.050086115, -0.011789751, -0.041304782, -0.016519204, 0.008991379, -0.001061247, 0.0072229165, -0.020381363, 0.02714353, -0.0066673076, 0.00031316528, 0.020123886, 0.020489775, -0.028024374, 0.003641949, 0.0032709783, 0.06862448, 0.028674843, -0.005332491, -0.014161252, -0.029542135, 0.0026645516, 0.01885005, 0.0064504845, -0.011674564, 0.034474857, 0.014025738, 0.017779486, 0.008788108, 0.021736506, 0.057403892, -0.017887898, 0.013985084, 0.0053087757, 0.019798651, 0.03119541, 0.053284254, 0.037727203, -0.0097976895, -0.024663618, -0.024921095, 0.0057796882, -0.019730894, -0.048703868, 0.021343514, -0.015950043, 0.02214305, -0.022468284, 0.006809598, -0.021844918, -0.028214093, 0.012040453, -0.00937082, -0.015909389, -0.011044422, -0.029487928, 0.011633909, 0.012338584, 0.018443508, 0.03366177, -0.011464517, 0.011566153, -0.070846915, 0.005193589, 0.023999596, 0.020205194, -0.030382324, 0.016844438, -0.008259602, -0.022359872, 0.0068807425, 0.034122523, 0.01215564, 0.005857609, -0.005376533, -0.03306551, -0.016031351, -0.013510784, -0.009546988, -0.0129890535, 0.0061116987, -0.05051976, 0.02845802, 0.026357546, -0.008971052, -0.01970379, -0.035206635, -0.09242081, 0.025300534, -0.03526084, -0.015150508, -0.044150583, -0.024568757, 0.019446313, -0.02864774, -0.0041162493, -0.044828158, -0.03965151, 0.024284177, -0.015502845, 0.069491774, -0.014540693, 0.04165712, 0.005474781, -0.0027695752, -0.005356206, -0.022319218, 0.009858672, -0.01582808, -0.03233373, -0.016925747, -0.055344075, 0.012372463, 0.0164921, -0.039814126, -0.060276795, -0.028403815, -0.04515339, -0.011464517, 0.037456173, -0.039841227, 0.013944429, 0.028539328, 0.014540693, 0.004119637, 0.011105403, 0.042144973, -0.022292115, -0.037862718, -0.038106643, 0.028972974, 0.00079487654, -0.0053866967, -0.025165021, 0.0071890377, -0… │\n", + "└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "duckdb.sql(\"SELECT embed_openai('Which papers are related to quantum computing?') AS query_embedding;\")" + ] + }, + { + "cell_type": "markdown", + "id": "9f452192", + "metadata": {}, + "source": [ + "### Generating Embeddings\n", + "\n", + "With the embedding function in place, we can now use it to generate and write embeddings into our table via SQL. The below query should run on every row in the table, calling the openai embedding UDF we previously defined. On 400 rows, it should take around 2 minutes to complete." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "2d1d68ba", + "metadata": {}, + "outputs": [], + "source": [ + "duckdb.query(\"\"\"\n", + "UPDATE papers\n", + "SET embeddings = embed_openai(\n", + " COALESCE(titles, '') || ' ' || COALESCE(summaries, '')\n", + ")\n", + "WHERE embeddings IS NULL\n", + "\"\"\")" + ] + }, + { + "cell_type": "markdown", + "id": "4cc350f1", + "metadata": {}, + "source": [ + "Inspecting the first 5 rows of the dataset we can see that the embeddings have been created for every row." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5642bcf9", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titlessummariestermsembeddings
0Survey on Semantic Stereo Matching / Semantic ...Stereo matching is one of the widely used tech...['cs.CV', 'cs.LG'][-0.018463377, -0.03012074, 0.010921418, -0.04...
1FUTURE-AI: Guiding Principles and Consensus Re...The recent advancements in artificial intellig...['cs.CV', 'cs.AI', 'cs.LG'][-0.015125522, -0.020882344, 0.042208467, 0.04...
2Enforcing Mutual Consistency of Hard Regions f...In this paper, we proposed a novel mutual cons...['cs.CV', 'cs.AI'][0.00833142, -0.021476267, 0.037161183, 0.0197...
3Parameter Decoupling Strategy for Semi-supervi...Consistency training has proven to be an advan...['cs.CV'][0.014294317, -0.020803811, 0.03544353, 0.0138...
4Background-Foreground Segmentation for Interio...To ensure safety in automated driving, the cor...['cs.CV', 'cs.LG'][-0.009169946, 0.0074990084, 0.011346209, -0.0...
\n", + "
" + ], + "text/plain": [ + " titles \\\n", + "0 Survey on Semantic Stereo Matching / Semantic ... \n", + "1 FUTURE-AI: Guiding Principles and Consensus Re... \n", + "2 Enforcing Mutual Consistency of Hard Regions f... \n", + "3 Parameter Decoupling Strategy for Semi-supervi... \n", + "4 Background-Foreground Segmentation for Interio... \n", + "\n", + " summaries \\\n", + "0 Stereo matching is one of the widely used tech... \n", + "1 The recent advancements in artificial intellig... \n", + "2 In this paper, we proposed a novel mutual cons... \n", + "3 Consistency training has proven to be an advan... \n", + "4 To ensure safety in automated driving, the cor... \n", + "\n", + " terms \\\n", + "0 ['cs.CV', 'cs.LG'] \n", + "1 ['cs.CV', 'cs.AI', 'cs.LG'] \n", + "2 ['cs.CV', 'cs.AI'] \n", + "3 ['cs.CV'] \n", + "4 ['cs.CV', 'cs.LG'] \n", + "\n", + " embeddings \n", + "0 [-0.018463377, -0.03012074, 0.010921418, -0.04... \n", + "1 [-0.015125522, -0.020882344, 0.042208467, 0.04... \n", + "2 [0.00833142, -0.021476267, 0.037161183, 0.0197... \n", + "3 [0.014294317, -0.020803811, 0.03544353, 0.0138... \n", + "4 [-0.009169946, 0.0074990084, 0.011346209, -0.0... " + ] + }, + "execution_count": 126, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "result = duckdb.sql(\"SELECT * FROM papers LIMIT 5\").df()\n", + "result.head()" + ] + }, + { + "cell_type": "markdown", + "id": "975954b4", + "metadata": {}, + "source": [ + "## Running a Similarity Search with SQL" + ] + }, + { + "cell_type": "markdown", + "id": "35ee611a", + "metadata": {}, + "source": [ + "Now that we have embeddings for each paper, we can use them to perform a semantic similarity search. \n", + "\n", + "To do this, we can use an array distance function native to DuckDB such as array_cosine_similarity that computes the cosine similarity between two vectors.\n", + "\n", + "Below we define a query that uses our embed_openai function to generate an embedding for a query, and then uses the array_cosine_similarity function to compute the similarity between the query embedding and each of the paper embeddings.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "357d8fad", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titlessummariesscore
0Medical Matting: A New Perspective on Medical ...In medical image segmentation, it is difficult...0.579598
1Self-Supervision with Superpixels: Training Fe...Few-shot semantic segmentation (FSS) has great...0.570959
2A Spatial Guided Self-supervised Clustering Ne...The segmentation of medical images is a fundam...0.562010
3Superpixel-Guided Label Softening for Medical ...Segmentation of objects of interest is one of ...0.561668
4Efficient and Generic Interactive Segmentation...Semantic segmentation of medical images is an ...0.560177
\n", + "
" + ], + "text/plain": [ + " titles \\\n", + "0 Medical Matting: A New Perspective on Medical ... \n", + "1 Self-Supervision with Superpixels: Training Fe... \n", + "2 A Spatial Guided Self-supervised Clustering Ne... \n", + "3 Superpixel-Guided Label Softening for Medical ... \n", + "4 Efficient and Generic Interactive Segmentation... \n", + "\n", + " summaries score \n", + "0 In medical image segmentation, it is difficult... 0.579598 \n", + "1 Few-shot semantic segmentation (FSS) has great... 0.570959 \n", + "2 The segmentation of medical images is a fundam... 0.562010 \n", + "3 Segmentation of objects of interest is one of ... 0.561668 \n", + "4 Semantic segmentation of medical images is an ... 0.560177 " + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def search_papers(query_text: str, k: int = 5):\n", + " return duckdb.execute(\"\"\"\n", + " WITH q AS (\n", + " SELECT embed_openai(?) AS qe\n", + " )\n", + " SELECT\n", + " titles,\n", + " summaries,\n", + " array_cosine_similarity(embeddings, q.qe) AS score\n", + " FROM papers, q\n", + " WHERE embeddings IS NOT NULL\n", + " ORDER BY score DESC\n", + " LIMIT ?\n", + " \"\"\", [query_text, k]).fetchdf()\n", + "\n", + "# Test the function\n", + "search_papers(\"What are the research papers on image segmentation for the medical field?\")" + ] + }, + { + "cell_type": "markdown", + "id": "0db28fc6", + "metadata": {}, + "source": [ + "### Optimizing queries with an index" + ] + }, + { + "cell_type": "markdown", + "id": "8fb6dbd9", + "metadata": {}, + "source": [ + "While the above search query works fine on 400 rows, it can eventually get much slower as the dataset grows into hundreds of thousands. Without an index, DuckDB will compare a query embedding with all document embeddings to find the most similar one.\n", + "\n", + "In order to speed up the vector search, we can use ANN (Approximate Nearest Neighbor) with [HNSW (Hierarchical Navigable Small World)](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world), supported via DuckDB's vector [similarity search extension](https://duckdb.org/2024/05/03/vector-similarity-search-vss.html).\n", + "\n", + "Let's try that out." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "887d3660", + "metadata": {}, + "outputs": [], + "source": [ + "# Install the extension\n", + "duckdb.sql(\"INSTALL vss;\")\n", + "duckdb.sql(\"LOAD vss;\")\n", + "duckdb.sql(\"SET GLOBAL hnsw_enable_experimental_persistence = true;\")\n", + "\n", + "# Create an index on the embeddings column" + ] + }, + { + "cell_type": "markdown", + "id": "753b3740", + "metadata": {}, + "source": [ + "Now we can verify that the index has been created and run a quick test" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9d016af2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "┌───────────────┬──────────────┬─────────────┬────────────┬────────────────┬───────────┬────────────┬───────────┬─────────┬───────────────────────┬───────────┬────────────┬──────────────┬────────────────────────────────────────────────────────────────┐\n", + "│ database_name │ database_oid │ schema_name │ schema_oid │ index_name │ index_oid │ table_name │ table_oid │ comment │ tags │ is_unique │ is_primary │ expressions │ sql │\n", + "│ varchar │ int64 │ varchar │ int64 │ varchar │ int64 │ varchar │ int64 │ varchar │ map(varchar, varchar) │ boolean │ boolean │ varchar │ varchar │\n", + "├───────────────┼──────────────┼─────────────┼────────────┼────────────────┼───────────┼────────────┼───────────┼─────────┼───────────────────────┼───────────┼────────────┼──────────────┼────────────────────────────────────────────────────────────────┤\n", + "│ memory │ 570 │ main │ 572 │ idx_embeddings │ 1994 │ papers │ 1977 │ NULL │ {} │ false │ false │ [embeddings] │ CREATE INDEX idx_embeddings ON papers USING HNSW (embeddings); │\n", + "└───────────────┴──────────────┴─────────────┴────────────┴────────────────┴───────────┴────────────┴───────────┴─────────┴───────────────────────┴───────────┴────────────┴──────────────┴────────────────────────────────────────────────────────────────┘" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Verify the index has been created\n", + "duckdb.sql(\"SELECT * FROM duckdb_indexes();\")" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "682ce99c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titlessummariesscore
0Medical Matting: A New Perspective on Medical ...In medical image segmentation, it is difficult...0.579598
1Self-Supervision with Superpixels: Training Fe...Few-shot semantic segmentation (FSS) has great...0.570959
2A Spatial Guided Self-supervised Clustering Ne...The segmentation of medical images is a fundam...0.562010
3Superpixel-Guided Label Softening for Medical ...Segmentation of objects of interest is one of ...0.561668
4Efficient and Generic Interactive Segmentation...Semantic segmentation of medical images is an ...0.560177
\n", + "
" + ], + "text/plain": [ + " titles \\\n", + "0 Medical Matting: A New Perspective on Medical ... \n", + "1 Self-Supervision with Superpixels: Training Fe... \n", + "2 A Spatial Guided Self-supervised Clustering Ne... \n", + "3 Superpixel-Guided Label Softening for Medical ... \n", + "4 Efficient and Generic Interactive Segmentation... \n", + "\n", + " summaries score \n", + "0 In medical image segmentation, it is difficult... 0.579598 \n", + "1 Few-shot semantic segmentation (FSS) has great... 0.570959 \n", + "2 The segmentation of medical images is a fundam... 0.562010 \n", + "3 Segmentation of objects of interest is one of ... 0.561668 \n", + "4 Semantic segmentation of medical images is an ... 0.560177 " + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Test the function\n", + "search_papers(\"What are the research papers on image segmentation for the medical field?\")" + ] + }, + { + "cell_type": "markdown", + "id": "977a9595", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "In this cookbook, we explored how to integrate OpenAI’s embedding calls as a reusable UDF in DuckDB. This approach proves especially powerful when you want to store and query embeddings directly alongside your data. By doing so, you unlock new opportunities for combining advanced data analysis with retrieval tasks, all through DuckDB’s simple and familiar SQL interface." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From 24bddc26f3b9fac388d33b5c61b4838a71f4acb8 Mon Sep 17 00:00:00 2001 From: Ayman Farhat Date: Sat, 6 Sep 2025 17:56:20 +0200 Subject: [PATCH 2/5] Clear unneeded outputs --- .../using-duckdb-with-openai-embeddings.ipynb | 392 ++---------------- 1 file changed, 34 insertions(+), 358 deletions(-) diff --git a/examples/vector_databases/duckdb/using-duckdb-with-openai-embeddings.ipynb b/examples/vector_databases/duckdb/using-duckdb-with-openai-embeddings.ipynb index dfd8728fb3..75d87b6f21 100644 --- a/examples/vector_databases/duckdb/using-duckdb-with-openai-embeddings.ipynb +++ b/examples/vector_databases/duckdb/using-duckdb-with-openai-embeddings.ipynb @@ -35,101 +35,10 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "id": "ad752660", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Collecting numpy\n", - " Using cached numpy-2.3.2-cp313-cp313-macosx_14_0_arm64.whl.metadata (62 kB)\n", - "Collecting kagglehub\n", - " Using cached kagglehub-0.3.13-py3-none-any.whl.metadata (38 kB)\n", - "Collecting duckdb\n", - " Using cached duckdb-1.3.2-cp313-cp313-macosx_12_0_arm64.whl.metadata (7.0 kB)\n", - "Collecting pandas\n", - " Using cached pandas-2.3.2-cp313-cp313-macosx_11_0_arm64.whl.metadata (91 kB)\n", - "Collecting openai\n", - " Using cached openai-1.106.1-py3-none-any.whl.metadata (29 kB)\n", - "Requirement already satisfied: packaging in /Users/ayman/.pyenv/versions/3.13.7/lib/python3.13/site-packages (from kagglehub) (25.0)\n", - "Collecting pyyaml (from kagglehub)\n", - " Using cached PyYAML-6.0.2-cp313-cp313-macosx_11_0_arm64.whl.metadata (2.1 kB)\n", - "Collecting requests (from kagglehub)\n", - " Using cached requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)\n", - "Collecting tqdm (from kagglehub)\n", - " Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)\n", - "Requirement already satisfied: python-dateutil>=2.8.2 in /Users/ayman/.pyenv/versions/3.13.7/lib/python3.13/site-packages (from pandas) (2.9.0.post0)\n", - "Collecting pytz>=2020.1 (from pandas)\n", - " Using cached pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)\n", - "Collecting tzdata>=2022.7 (from pandas)\n", - " Using cached tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)\n", - "Collecting anyio<5,>=3.5.0 (from openai)\n", - " Using cached anyio-4.10.0-py3-none-any.whl.metadata (4.0 kB)\n", - "Collecting distro<2,>=1.7.0 (from openai)\n", - " Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)\n", - "Collecting httpx<1,>=0.23.0 (from openai)\n", - " Using cached httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)\n", - "Collecting jiter<1,>=0.4.0 (from openai)\n", - " Using cached jiter-0.10.0-cp313-cp313-macosx_11_0_arm64.whl.metadata (5.2 kB)\n", - "Collecting pydantic<3,>=1.9.0 (from openai)\n", - " Using cached pydantic-2.11.7-py3-none-any.whl.metadata (67 kB)\n", - "Collecting sniffio (from openai)\n", - " Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)\n", - "Collecting typing-extensions<5,>=4.11 (from openai)\n", - " Using cached typing_extensions-4.15.0-py3-none-any.whl.metadata (3.3 kB)\n", - "Collecting idna>=2.8 (from anyio<5,>=3.5.0->openai)\n", - " Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)\n", - "Collecting certifi (from httpx<1,>=0.23.0->openai)\n", - " Using cached certifi-2025.8.3-py3-none-any.whl.metadata (2.4 kB)\n", - "Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)\n", - " Using cached httpcore-1.0.9-py3-none-any.whl.metadata (21 kB)\n", - "Collecting h11>=0.16 (from httpcore==1.*->httpx<1,>=0.23.0->openai)\n", - " Using cached h11-0.16.0-py3-none-any.whl.metadata (8.3 kB)\n", - "Collecting annotated-types>=0.6.0 (from pydantic<3,>=1.9.0->openai)\n", - " Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)\n", - "Collecting pydantic-core==2.33.2 (from pydantic<3,>=1.9.0->openai)\n", - " Using cached pydantic_core-2.33.2-cp313-cp313-macosx_11_0_arm64.whl.metadata (6.8 kB)\n", - "Collecting typing-inspection>=0.4.0 (from pydantic<3,>=1.9.0->openai)\n", - " Using cached typing_inspection-0.4.1-py3-none-any.whl.metadata (2.6 kB)\n", - "Requirement already satisfied: six>=1.5 in /Users/ayman/.pyenv/versions/3.13.7/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)\n", - "Collecting charset_normalizer<4,>=2 (from requests->kagglehub)\n", - " Using cached charset_normalizer-3.4.3-cp313-cp313-macosx_10_13_universal2.whl.metadata (36 kB)\n", - "Collecting urllib3<3,>=1.21.1 (from requests->kagglehub)\n", - " Using cached urllib3-2.5.0-py3-none-any.whl.metadata (6.5 kB)\n", - "Using cached numpy-2.3.2-cp313-cp313-macosx_14_0_arm64.whl (5.1 MB)\n", - "Using cached kagglehub-0.3.13-py3-none-any.whl (68 kB)\n", - "Using cached duckdb-1.3.2-cp313-cp313-macosx_12_0_arm64.whl (15.5 MB)\n", - "Using cached pandas-2.3.2-cp313-cp313-macosx_11_0_arm64.whl (10.7 MB)\n", - "Using cached openai-1.106.1-py3-none-any.whl (930 kB)\n", - "Using cached anyio-4.10.0-py3-none-any.whl (107 kB)\n", - "Using cached distro-1.9.0-py3-none-any.whl (20 kB)\n", - "Using cached httpx-0.28.1-py3-none-any.whl (73 kB)\n", - "Using cached httpcore-1.0.9-py3-none-any.whl (78 kB)\n", - "Using cached jiter-0.10.0-cp313-cp313-macosx_11_0_arm64.whl (318 kB)\n", - "Using cached pydantic-2.11.7-py3-none-any.whl (444 kB)\n", - "Using cached pydantic_core-2.33.2-cp313-cp313-macosx_11_0_arm64.whl (1.8 MB)\n", - "Using cached typing_extensions-4.15.0-py3-none-any.whl (44 kB)\n", - "Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB)\n", - "Using cached h11-0.16.0-py3-none-any.whl (37 kB)\n", - "Using cached idna-3.10-py3-none-any.whl (70 kB)\n", - "Using cached pytz-2025.2-py2.py3-none-any.whl (509 kB)\n", - "Using cached sniffio-1.3.1-py3-none-any.whl (10 kB)\n", - "Using cached tqdm-4.67.1-py3-none-any.whl (78 kB)\n", - "Using cached typing_inspection-0.4.1-py3-none-any.whl (14 kB)\n", - "Using cached tzdata-2025.2-py2.py3-none-any.whl (347 kB)\n", - "Using cached certifi-2025.8.3-py3-none-any.whl (161 kB)\n", - "Using cached PyYAML-6.0.2-cp313-cp313-macosx_11_0_arm64.whl (171 kB)\n", - "Using cached requests-2.32.5-py3-none-any.whl (64 kB)\n", - "Using cached charset_normalizer-3.4.3-cp313-cp313-macosx_10_13_universal2.whl (205 kB)\n", - "Using cached urllib3-2.5.0-py3-none-any.whl (129 kB)\n", - "Installing collected packages: pytz, urllib3, tzdata, typing-extensions, tqdm, sniffio, pyyaml, numpy, jiter, idna, h11, duckdb, distro, charset_normalizer, certifi, annotated-types, typing-inspection, requests, pydantic-core, pandas, httpcore, anyio, pydantic, kagglehub, httpx, openai\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m26/26\u001b[0m [openai]25/26\u001b[0m [openai]c]core]\n", - "\u001b[1A\u001b[2KSuccessfully installed annotated-types-0.7.0 anyio-4.10.0 certifi-2025.8.3 charset_normalizer-3.4.3 distro-1.9.0 duckdb-1.3.2 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 idna-3.10 jiter-0.10.0 kagglehub-0.3.13 numpy-2.3.2 openai-1.106.1 pandas-2.3.2 pydantic-2.11.7 pydantic-core-2.33.2 pytz-2025.2 pyyaml-6.0.2 requests-2.32.5 sniffio-1.3.1 tqdm-4.67.1 typing-extensions-4.15.0 typing-inspection-0.4.1 tzdata-2025.2 urllib3-2.5.0\n" - ] - } - ], + "outputs": [], "source": [ "!pip install numpy kagglehub duckdb pandas openai" ] @@ -145,114 +54,10 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "id": "6ae41715", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/ayman/.pyenv/versions/3.13.7/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", - " from .autonotebook import tqdm as notebook_tqdm\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/Users/ayman/.cache/kagglehub/datasets/spsayakpaul/arxiv-paper-abstracts/versions/2/arxiv_data.csv\n" - ] - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
titlessummariesterms
0Survey on Semantic Stereo Matching / Semantic ...Stereo matching is one of the widely used tech...['cs.CV', 'cs.LG']
1FUTURE-AI: Guiding Principles and Consensus Re...The recent advancements in artificial intellig...['cs.CV', 'cs.AI', 'cs.LG']
2Enforcing Mutual Consistency of Hard Regions f...In this paper, we proposed a novel mutual cons...['cs.CV', 'cs.AI']
3Parameter Decoupling Strategy for Semi-supervi...Consistency training has proven to be an advan...['cs.CV']
4Background-Foreground Segmentation for Interio...To ensure safety in automated driving, the cor...['cs.CV', 'cs.LG']
\n", - "
" - ], - "text/plain": [ - " titles \\\n", - "0 Survey on Semantic Stereo Matching / Semantic ... \n", - "1 FUTURE-AI: Guiding Principles and Consensus Re... \n", - "2 Enforcing Mutual Consistency of Hard Regions f... \n", - "3 Parameter Decoupling Strategy for Semi-supervi... \n", - "4 Background-Foreground Segmentation for Interio... \n", - "\n", - " summaries \\\n", - "0 Stereo matching is one of the widely used tech... \n", - "1 The recent advancements in artificial intellig... \n", - "2 In this paper, we proposed a novel mutual cons... \n", - "3 Consistency training has proven to be an advan... \n", - "4 To ensure safety in automated driving, the cor... \n", - "\n", - " terms \n", - "0 ['cs.CV', 'cs.LG'] \n", - "1 ['cs.CV', 'cs.AI', 'cs.LG'] \n", - "2 ['cs.CV', 'cs.AI'] \n", - "3 ['cs.CV'] \n", - "4 ['cs.CV', 'cs.LG'] " - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "import kagglehub\n", "import pandas as pd\n", @@ -278,8 +83,7 @@ "# Inspect the first 5 rows of the dataset\n", "result = duckdb.sql(\"SELECT * FROM papers LIMIT 5\").df()\n", "\n", - "result.head()\n", - "\n" + "result.head()" ] }, { @@ -292,7 +96,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 24, "id": "5da323b3", "metadata": {}, "outputs": [ @@ -310,7 +114,7 @@ "└───────┴────────────┴─────────────┴─────────┴────────────┴─────────┘" ] }, - "execution_count": 3, + "execution_count": 24, "metadata": {}, "output_type": "execute_result" } @@ -359,18 +163,7 @@ "execution_count": null, "id": "bf1dcf16", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "import numpy as np\n", "from duckdb.typing import VARCHAR\n", @@ -414,26 +207,10 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "id": "0619fbf5", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐\n", - "│ query_embedding │\n", - "│ float[1024] │\n", - "├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤\n", - "│ [-0.011518722, -0.010231336, 0.043689836, 0.01666827, 0.008659369, -0.012934848, -0.006480975, 0.066835694, -0.03431224, -0.043771144, 0.03317392, -0.010658206, -0.06461326, 0.033987008, -0.00097570353, 0.031141205, 0.0007499874, 0.031981394, 0.09415539, 0.047077697, -0.029352415, -0.039136555, 0.042930957, -0.027428111, -0.03845898, -0.04284965, 0.01622107, -0.0019124467, 0.054151546, -0.024717823, 0.041521605, -0.031683262, -0.029921575, -0.025517358, -0.07854413, 0.060331002, -0.049354337, 0.015841631, -0.05024873, 0.032252423, -0.0164921, -0.020001922, -0.024419691, 0.019852856, -0.040871136, -0.039136555, 0.007846283, -0.07561702, -0.03404121, 0.025002403, 0.044340305, 0.02393184, -0.008442546, -0.018158928, -0.015814528, -0.01639724, -5.9816895e-05, -0.013293961, 0.040464595, -0.018836498, 0.005932142, -0.008835537, -0.045885168, -0.006609714, -0.03656178, 0.00617268, -0.013354942, 0.012494426, 0.010380401, 0.050438453, -0.01780659, 0.06857028, 0.07512917, 0.044123482, 0.037998233, -0.017481355, -0.051278643, 0.018457059, 0.02271221, 0.04664405, -0.013598868, 0.0032354058, 0.055073045, -0.03268607, -0.05946371, -0.0023664199, 0.014581348, 0.016356586, -0.093342304, -0.041521605, 0.021885572, 0.024392588, -0.005854221, -0.0032201605, 0.04173843, -0.011105403, 0.0801161, 0.028891666, -0.025300534, 0.012054004, -0.07951984, -0.060547825, -0.002286805, -0.015421537, 0.10602645, 0.004692185, -0.028512226, -0.015502845, -0.021844918, 0.02063884, -0.031114101, -0.0040349406, -0.07025065, 0.022658005, -0.030951485, 0.012230173, -0.02902718, -0.063908584, -0.0046447553, 0.035206635, -0.06152353, -0.00090540544, 0.025612218, 0.047430035, 0.0090930145, -0.001903977, -0.00052003644, -0.10109373, 0.016478548, 0.01941921, 0.037727203, 0.025951004, -0.005423963, -0.006704574, 0.019459866, -0.050059013, 0.023118753, 0.02760428, 0.007643011, -0.023023894, 0.07588805, -0.0036046826, 0.04165712, 0.055777717, 0.0085645085, 0.028376712, 0.021384168, -0.044909466, 0.025043057, -0.024392588, -0.040247772, -0.051495466, 0.030002885, 0.034176726, 0.00084654137, -0.015692566, 0.018443508, 0.04157581, -0.009655399, -0.062607646, -0.010061943, 0.026533715, -0.028376712, -0.015706116, 0.06071044, -0.032902893, -0.009777362, -0.0037808511, -0.04390666, -0.009614745, 0.018050516, -0.0068468642, 0.018118273, -0.0019954492, 0.050330043, -0.030138398, -0.020015474, -0.037347764, -0.021790713, -0.017481355, 0.02318651, 0.021844918, 0.003835057, -0.029271107, 0.0017396659, -0.008239274, 0.03469168, 0.010400728, -0.016641166, 0.010861478, 0.011491619, 0.044963673, 0.020950524, 0.036995426, 0.029542135, -0.020056129, 0.007615908, -0.0036283976, 0.011762648, -0.0894937, 0.019093975, 0.010583674, 0.0027102877, 0.05626557, 0.0014313706, -0.06916654, 0.06461326, -0.0442861, -0.037862718, -0.036101032, -0.02290193, 0.059138477, 0.03883842, 0.040383287, -0.0038655477, -0.05718707, -0.0135175595, -0.028566431, -0.01894491, 0.008842314, 0.029569238, 0.005955857, 0.035369255, 0.065751575, 0.02044912, 0.054449677, -0.029433724, -0.0047599426, 0.03149354, -0.039055243, -0.00096299907, 0.020598186, -0.037456173, 0.01714257, -0.016126212, 0.01876874, 0.020503325, -0.011450965, -0.07073851, -0.101635784, 0.0032963874, -0.0013966451, -0.022197256, 0.05379921, 0.015367331, -0.0021546786, -0.031141205, -0.0030372161, -0.007324552, -0.061035678, -0.015665462, 0.041142166, 0.0018853438, 0.013382046, 0.005911815, -0.0035606404, 0.002673021, -0.005959245, -0.018213132, 0.03138513, -0.030734662, -0.03753748, 0.013287185, -0.02591035, -0.013111017, 0.06531793, 0.0045905495, 0.02959634, -0.069003925, -0.011240918, 0.01997482, 0.024392588, 0.02317296, -0.036697295, -0.05680763, -0.036317855, -0.03902814, -0.017711729, -0.01782014, 0.04062721, 0.020340709, 0.05201042, 0.02902718, 0.03873001, 0.011105403, 0.061794557, 0.0017337371, 0.023823429, -0.022861276, -0.0034827197, 0.074858144, 0.0069518876, 0.046318814, -0.03363467, -0.002916947, -0.040952444, 0.016044904, -0.021275757, -0.019608932, -0.0059761843, -0.000892701, -0.019093975, 0.009675727, -0.018023413, -0.045559935, 0.015069199, 0.045614142, -0.006027002, 0.0070535233, -0.011993023, -0.026113622, -0.040925343, 0.030544942, 0.030111296, -0.08396471, -0.09946755, -0.031032793, 0.044367407, 0.002437565, -0.064125404, 0.066293634, -0.011525498, 0.022508938, 0.030273912, -0.022536041, 0.055723514, -0.021506133, 0.018606124, 0.011234142, -0.025503807, 0.053365562, 0.028349608, -0.0048819054, 0.023349129, -0.04881228, -0.015069199, -0.0035775797, 0.084560975, -0.07176842, -0.038919732, -0.064179614, 0.026831847, -0.017589767, 0.025110815, -0.006247213, 0.031710364, -0.0011950674, 0.035206635, 0.056102954, 0.026696334, -0.046318814, -0.020842113, -0.014174804, -0.047457136, 0.06851607, -0.011146057, -0.018213132, -0.034800094, -0.015299574, -0.0072161406, -0.019757997, 0.016654717, 0.032171115, 0.041060857, 0.0706843, -0.02902718, -0.036426265, -0.030788867, 0.022915483, 0.007920816, -0.021004729, 0.052281447, 0.024446795, -0.03157485, -0.029487928, -0.012819661, -0.02290193, -0.0009477537, 0.026438856, 0.01432387, -0.0006932407, 0.015977146, -0.010136476, -0.0049530505, 0.0077988524, -0.0040213894, -0.008124087, -0.032957096, -0.03534215, 0.012372463, -0.006799434, -0.01922949, -0.023715017, 0.069003925, -0.022834172, -0.0037571362, 0.03618234, -0.006287867, 0.005647562, 0.018050516, 0.04257862, -0.014445833, 0.014879479, -0.021086037, -0.0036792154, 0.011566153, 0.057891745, 0.031358026, 0.024351934, -0.018904256, 0.028186992, 0.0064708116, 0.009357268, -0.027116427, -0.04645433, -0.04634592, 0.016234623, 0.042822544, 0.026357546, 0.035586078, 0.079194605, -0.0020242461, -0.05425996, 0.01310424, -0.008998155, -0.023322025, 0.019812202, -0.0858077, 0.017074812, -0.0895479, -0.0268454, -0.004438096, -0.003206609, 0.04702349, -0.0145542445, -0.0052173035, -0.019365005, 0.04127768, -0.015773874, -0.011315451, -0.034176726, 0.00905236, 0.009723157, 0.016993504, -0.016817335, 0.010861478, 0.0138495695, -0.035775796, 0.043120675, -0.053880516, 0.06190297, -0.04550573, 0.045641243, -0.05247117, -0.031330924, -0.020313606, 0.057891745, -0.025679976, 0.010658206, -0.034176726, 0.009045585, 0.015136956, -0.030084193, 0.0282683, 0.03818795, 0.018240236, -0.0013644604, 0.053392667, -0.021072486, 0.011545825, 0.01413415, -0.019107528, 0.013524335, -0.023498194, 0.013998635, -0.007670114, 0.009533437, 0.014025738, 0.030111296, -0.01809117, 0.0067079617, -0.017657524, 0.004427932, 0.06754037, -0.017738832, 0.03032812, -0.051116023, 0.027915962, 0.038296364, 0.041494504, -0.02459586, -0.023633707, 0.030870177, -0.012779006, 0.06136091, 0.06190297, -0.012386015, -0.033499155, 0.0106988605, -0.009404698, 0.009709605, 0.017318739, 0.036914118, 0.0254496, 0.02714353, 0.011837181, -0.026777642, 0.046969283, 0.042416003, -0.0028864562, 0.07128056, -0.042930957, -0.010089045, -0.030707559, -0.021804264, 0.019148182, 0.017671075, -0.0045905495, 0.006098147, -0.011803303, 0.020367812, -0.041060857, -0.002750942, -0.01582808, -0.010861478, -0.0058101793, -0.038377672, -0.08087498, 0.0025493642, -0.06857028, 0.031520646, -0.033797286, -0.026249135, -0.03753748, -0.0122640515, -0.021031832, 0.010807272, 0.012623165, 0.017860796, -0.049977705, -0.013585317, -0.021736506, 0.030165501, -0.0012179355, -0.027753346, 0.032902893, 0.03428514, 0.0069654393, -0.012785782, 0.03723935, 0.014865927, 0.011085076, 0.015380883, 0.036724396, 0.026249135, 0.0029880921, 0.017034158, -0.019744445, -0.0009325083, -0.026547268, 0.007832731, 0.0014610144, -0.0052410187, 0.011525498, 0.033688877, 0.013998635, -0.028810358, -0.029379517, 0.0049598264, 0.018606124, 0.0075142724, 0.0038519963, -0.012731576, 0.006243825, 0.01809117, 0.012663819, -0.015773874, -0.019080425, 0.0029830104, 0.0059423055, 0.009032033, -0.04474685, -0.0051495465, -0.023145856, -0.028430916, 0.022400526, 0.029731855, 0.014649104, 0.040003844, 0.035830002, -0.018633228, 0.008273153, -0.018321544, -0.037835617, -0.0064979144, 0.0071483837, -0.011003768, 0.02383698, -0.022115948, 0.004143352, 0.009675727, 0.047348723, -0.030463632, -0.016383689, -0.008550958, -0.014743964, -0.0050716256, -0.023593053, 0.050086115, -0.011789751, -0.041304782, -0.016519204, 0.008991379, -0.001061247, 0.0072229165, -0.020381363, 0.02714353, -0.0066673076, 0.00031316528, 0.020123886, 0.020489775, -0.028024374, 0.003641949, 0.0032709783, 0.06862448, 0.028674843, -0.005332491, -0.014161252, -0.029542135, 0.0026645516, 0.01885005, 0.0064504845, -0.011674564, 0.034474857, 0.014025738, 0.017779486, 0.008788108, 0.021736506, 0.057403892, -0.017887898, 0.013985084, 0.0053087757, 0.019798651, 0.03119541, 0.053284254, 0.037727203, -0.0097976895, -0.024663618, -0.024921095, 0.0057796882, -0.019730894, -0.048703868, 0.021343514, -0.015950043, 0.02214305, -0.022468284, 0.006809598, -0.021844918, -0.028214093, 0.012040453, -0.00937082, -0.015909389, -0.011044422, -0.029487928, 0.011633909, 0.012338584, 0.018443508, 0.03366177, -0.011464517, 0.011566153, -0.070846915, 0.005193589, 0.023999596, 0.020205194, -0.030382324, 0.016844438, -0.008259602, -0.022359872, 0.0068807425, 0.034122523, 0.01215564, 0.005857609, -0.005376533, -0.03306551, -0.016031351, -0.013510784, -0.009546988, -0.0129890535, 0.0061116987, -0.05051976, 0.02845802, 0.026357546, -0.008971052, -0.01970379, -0.035206635, -0.09242081, 0.025300534, -0.03526084, -0.015150508, -0.044150583, -0.024568757, 0.019446313, -0.02864774, -0.0041162493, -0.044828158, -0.03965151, 0.024284177, -0.015502845, 0.069491774, -0.014540693, 0.04165712, 0.005474781, -0.0027695752, -0.005356206, -0.022319218, 0.009858672, -0.01582808, -0.03233373, -0.016925747, -0.055344075, 0.012372463, 0.0164921, -0.039814126, -0.060276795, -0.028403815, -0.04515339, -0.011464517, 0.037456173, -0.039841227, 0.013944429, 0.028539328, 0.014540693, 0.004119637, 0.011105403, 0.042144973, -0.022292115, -0.037862718, -0.038106643, 0.028972974, 0.00079487654, -0.0053866967, -0.025165021, 0.0071890377, -0… │\n", - "└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "duckdb.sql(\"SELECT embed_openai('Which papers are related to quantum computing?') AS query_embedding;\")" ] @@ -450,7 +227,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 27, "id": "2d1d68ba", "metadata": {}, "outputs": [], @@ -477,109 +254,7 @@ "execution_count": null, "id": "5642bcf9", "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
titlessummariestermsembeddings
0Survey on Semantic Stereo Matching / Semantic ...Stereo matching is one of the widely used tech...['cs.CV', 'cs.LG'][-0.018463377, -0.03012074, 0.010921418, -0.04...
1FUTURE-AI: Guiding Principles and Consensus Re...The recent advancements in artificial intellig...['cs.CV', 'cs.AI', 'cs.LG'][-0.015125522, -0.020882344, 0.042208467, 0.04...
2Enforcing Mutual Consistency of Hard Regions f...In this paper, we proposed a novel mutual cons...['cs.CV', 'cs.AI'][0.00833142, -0.021476267, 0.037161183, 0.0197...
3Parameter Decoupling Strategy for Semi-supervi...Consistency training has proven to be an advan...['cs.CV'][0.014294317, -0.020803811, 0.03544353, 0.0138...
4Background-Foreground Segmentation for Interio...To ensure safety in automated driving, the cor...['cs.CV', 'cs.LG'][-0.009169946, 0.0074990084, 0.011346209, -0.0...
\n", - "
" - ], - "text/plain": [ - " titles \\\n", - "0 Survey on Semantic Stereo Matching / Semantic ... \n", - "1 FUTURE-AI: Guiding Principles and Consensus Re... \n", - "2 Enforcing Mutual Consistency of Hard Regions f... \n", - "3 Parameter Decoupling Strategy for Semi-supervi... \n", - "4 Background-Foreground Segmentation for Interio... \n", - "\n", - " summaries \\\n", - "0 Stereo matching is one of the widely used tech... \n", - "1 The recent advancements in artificial intellig... \n", - "2 In this paper, we proposed a novel mutual cons... \n", - "3 Consistency training has proven to be an advan... \n", - "4 To ensure safety in automated driving, the cor... \n", - "\n", - " terms \\\n", - "0 ['cs.CV', 'cs.LG'] \n", - "1 ['cs.CV', 'cs.AI', 'cs.LG'] \n", - "2 ['cs.CV', 'cs.AI'] \n", - "3 ['cs.CV'] \n", - "4 ['cs.CV', 'cs.LG'] \n", - "\n", - " embeddings \n", - "0 [-0.018463377, -0.03012074, 0.010921418, -0.04... \n", - "1 [-0.015125522, -0.020882344, 0.042208467, 0.04... \n", - "2 [0.00833142, -0.021476267, 0.037161183, 0.0197... \n", - "3 [0.014294317, -0.020803811, 0.03544353, 0.0138... \n", - "4 [-0.009169946, 0.0074990084, 0.011346209, -0.0... " - ] - }, - "execution_count": 126, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "result = duckdb.sql(\"SELECT * FROM papers LIMIT 5\").df()\n", "result.head()" @@ -608,7 +283,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 29, "id": "357d8fad", "metadata": {}, "outputs": [ @@ -661,7 +336,7 @@ " 3\n", " Superpixel-Guided Label Softening for Medical ...\n", " Segmentation of objects of interest is one of ...\n", - " 0.561668\n", + " 0.561727\n", " \n", " \n", " 4\n", @@ -685,11 +360,11 @@ "0 In medical image segmentation, it is difficult... 0.579598 \n", "1 Few-shot semantic segmentation (FSS) has great... 0.570959 \n", "2 The segmentation of medical images is a fundam... 0.562010 \n", - "3 Segmentation of objects of interest is one of ... 0.561668 \n", + "3 Segmentation of objects of interest is one of ... 0.561727 \n", "4 Semantic segmentation of medical images is an ... 0.560177 " ] }, - "execution_count": 9, + "execution_count": 29, "metadata": {}, "output_type": "execute_result" } @@ -736,7 +411,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 32, "id": "887d3660", "metadata": {}, "outputs": [], @@ -746,7 +421,8 @@ "duckdb.sql(\"LOAD vss;\")\n", "duckdb.sql(\"SET GLOBAL hnsw_enable_experimental_persistence = true;\")\n", "\n", - "# Create an index on the embeddings column" + "# Create an index on the embeddings column\n", + "duckdb.sql(\"CREATE INDEX IF NOT EXISTS idx_embeddings ON papers USING HNSW (embeddings);\")" ] }, { @@ -759,7 +435,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 33, "id": "9d016af2", "metadata": {}, "outputs": [ @@ -770,11 +446,11 @@ "│ database_name │ database_oid │ schema_name │ schema_oid │ index_name │ index_oid │ table_name │ table_oid │ comment │ tags │ is_unique │ is_primary │ expressions │ sql │\n", "│ varchar │ int64 │ varchar │ int64 │ varchar │ int64 │ varchar │ int64 │ varchar │ map(varchar, varchar) │ boolean │ boolean │ varchar │ varchar │\n", "├───────────────┼──────────────┼─────────────┼────────────┼────────────────┼───────────┼────────────┼───────────┼─────────┼───────────────────────┼───────────┼────────────┼──────────────┼────────────────────────────────────────────────────────────────┤\n", - "│ memory │ 570 │ main │ 572 │ idx_embeddings │ 1994 │ papers │ 1977 │ NULL │ {} │ false │ false │ [embeddings] │ CREATE INDEX idx_embeddings ON papers USING HNSW (embeddings); │\n", + "│ memory │ 570 │ main │ 572 │ idx_embeddings │ 2070 │ papers │ 2065 │ NULL │ {} │ false │ false │ [embeddings] │ CREATE INDEX idx_embeddings ON papers USING HNSW (embeddings); │\n", "└───────────────┴──────────────┴─────────────┴────────────┴────────────────┴───────────┴────────────┴───────────┴─────────┴───────────────────────┴───────────┴────────────┴──────────────┴────────────────────────────────────────────────────────────────┘" ] }, - "execution_count": 18, + "execution_count": 33, "metadata": {}, "output_type": "execute_result" } @@ -786,7 +462,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 34, "id": "682ce99c", "metadata": {}, "outputs": [ @@ -821,31 +497,31 @@ " 0\n", " Medical Matting: A New Perspective on Medical ...\n", " In medical image segmentation, it is difficult...\n", - " 0.579598\n", + " 0.579481\n", " \n", " \n", " 1\n", " Self-Supervision with Superpixels: Training Fe...\n", " Few-shot semantic segmentation (FSS) has great...\n", - " 0.570959\n", + " 0.570870\n", " \n", " \n", " 2\n", " A Spatial Guided Self-supervised Clustering Ne...\n", " The segmentation of medical images is a fundam...\n", - " 0.562010\n", + " 0.562006\n", " \n", " \n", " 3\n", " Superpixel-Guided Label Softening for Medical ...\n", " Segmentation of objects of interest is one of ...\n", - " 0.561668\n", + " 0.561522\n", " \n", " \n", " 4\n", " Efficient and Generic Interactive Segmentation...\n", " Semantic segmentation of medical images is an ...\n", - " 0.560177\n", + " 0.560087\n", " \n", " \n", "\n", @@ -860,14 +536,14 @@ "4 Efficient and Generic Interactive Segmentation... \n", "\n", " summaries score \n", - "0 In medical image segmentation, it is difficult... 0.579598 \n", - "1 Few-shot semantic segmentation (FSS) has great... 0.570959 \n", - "2 The segmentation of medical images is a fundam... 0.562010 \n", - "3 Segmentation of objects of interest is one of ... 0.561668 \n", - "4 Semantic segmentation of medical images is an ... 0.560177 " + "0 In medical image segmentation, it is difficult... 0.579481 \n", + "1 Few-shot semantic segmentation (FSS) has great... 0.570870 \n", + "2 The segmentation of medical images is a fundam... 0.562006 \n", + "3 Segmentation of objects of interest is one of ... 0.561522 \n", + "4 Semantic segmentation of medical images is an ... 0.560087 " ] }, - "execution_count": 20, + "execution_count": 34, "metadata": {}, "output_type": "execute_result" } From 2d19b669e837a1d70bb77ecba62a6031539cf2c4 Mon Sep 17 00:00:00 2001 From: Ayman Farhat Date: Sat, 6 Sep 2025 17:59:14 +0200 Subject: [PATCH 3/5] Update notebook file name and add README.md --- examples/vector_databases/duckdb/README.md | 8 ++++++++ ...ings.ipynb => duckdb-sql-with-openai-embeddings.ipynb} | 0 2 files changed, 8 insertions(+) rename examples/vector_databases/duckdb/{using-duckdb-with-openai-embeddings.ipynb => duckdb-sql-with-openai-embeddings.ipynb} (100%) diff --git a/examples/vector_databases/duckdb/README.md b/examples/vector_databases/duckdb/README.md index e69de29bb2..fbffef3061 100644 --- a/examples/vector_databases/duckdb/README.md +++ b/examples/vector_databases/duckdb/README.md @@ -0,0 +1,8 @@ +# DuckDB + +[DuckDB](https://duckdb.org/) is an in-process SQL OLAP database management system designed for analytics. +DuckDB provides a lightweight, efficient way to query and store embeddings directly alongside your structured data, making it easy to combine vector operations with rich SQL analytics. + +For technical details, refer to the [DuckDB documentation](https://duckdb.org/docs/). + +The [`duckdb`](https://github.com/duckdb/duckdb) GitHub repository contains the source code, examples, and community resources for experimenting with DuckDB. diff --git a/examples/vector_databases/duckdb/using-duckdb-with-openai-embeddings.ipynb b/examples/vector_databases/duckdb/duckdb-sql-with-openai-embeddings.ipynb similarity index 100% rename from examples/vector_databases/duckdb/using-duckdb-with-openai-embeddings.ipynb rename to examples/vector_databases/duckdb/duckdb-sql-with-openai-embeddings.ipynb From dfa948ea35c1b9a6437995d1cddc5c9abee908e0 Mon Sep 17 00:00:00 2001 From: Ayman Farhat Date: Sat, 6 Sep 2025 18:09:36 +0200 Subject: [PATCH 4/5] Update title and related metadata --- authors.yaml | 5 +++++ .../duckdb/duckdb-sql-with-openai-embeddings.ipynb | 2 +- registry.yaml | 11 +++++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/authors.yaml b/authors.yaml index b164a40e2a..0348aa559b 100644 --- a/authors.yaml +++ b/authors.yaml @@ -472,3 +472,8 @@ heejingithub: name: "Heejin Cho" website: "https://www.linkedin.com/in/heejc/" avatar: "https://avatars.githubusercontent.com/u/169293861" + +ayman-openai: + name: "Ayman Farhat" + website: "https://www.linkedin.com/in/ayman-farhat-7baa9a11/" + avatar: "https://avatars.githubusercontent.com/u/229349247" \ No newline at end of file diff --git a/examples/vector_databases/duckdb/duckdb-sql-with-openai-embeddings.ipynb b/examples/vector_databases/duckdb/duckdb-sql-with-openai-embeddings.ipynb index 75d87b6f21..6e40c92268 100644 --- a/examples/vector_databases/duckdb/duckdb-sql-with-openai-embeddings.ipynb +++ b/examples/vector_databases/duckdb/duckdb-sql-with-openai-embeddings.ipynb @@ -5,7 +5,7 @@ "id": "0434d61f", "metadata": {}, "source": [ - "# Semantic Search using DuckDB SQL and OpenAI Embeddings\n" + "# Semantic Search with DuckDB and OpenAI Embeddings\n" ] }, { diff --git a/registry.yaml b/registry.yaml index 7e2c5a2606..57e426c1be 100644 --- a/registry.yaml +++ b/registry.yaml @@ -2501,3 +2501,14 @@ - katiagg tags: - images + +- title: Semantic Search with DuckDB and OpenAI Embeddings + path: examples/vector_databases/duckdb/duckdb-sql-with-openai-embeddings.ipynb + date: 2025-09-06 + authors: + - ayman-openai + tags: + - embeddings + - duckdb + - sql + - semantic-search \ No newline at end of file From 8a14ecdac94c15c530f76f74a47b80ae025ca06e Mon Sep 17 00:00:00 2001 From: Ayman Farhat Date: Sat, 6 Sep 2025 18:24:42 +0200 Subject: [PATCH 5/5] Improve language and grammar --- .../duckdb-sql-with-openai-embeddings.ipynb | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/examples/vector_databases/duckdb/duckdb-sql-with-openai-embeddings.ipynb b/examples/vector_databases/duckdb/duckdb-sql-with-openai-embeddings.ipynb index 6e40c92268..e963466767 100644 --- a/examples/vector_databases/duckdb/duckdb-sql-with-openai-embeddings.ipynb +++ b/examples/vector_databases/duckdb/duckdb-sql-with-openai-embeddings.ipynb @@ -194,7 +194,7 @@ "id": "222d1038", "metadata": {}, "source": [ - "*Note on performance:* The above function, will run a call to OpenAI's embeddings API for every single row. Depending on your dataset size, this might be quite slow. For larger datasets, consider [upgrading this function](https://lukaszrogalski.substack.com/p/python-udfs-in-duckdb) to work with aggregated data and pass in multiple sentences (batches) to the OpenAI embeddings call." + "*Note on performance:* The above function will run a call to OpenAI's embeddings API for every single row. Depending on your dataset size, this might be slow. For larger datasets, consider [upgrading this function](https://lukaszrogalski.substack.com/p/python-udfs-in-duckdb) to work with aggregated data and pass in multiple sentences (batches) to the OpenAI embeddings call." ] }, { @@ -222,7 +222,7 @@ "source": [ "### Generating Embeddings\n", "\n", - "With the embedding function in place, we can now use it to generate and write embeddings into our table via SQL. The below query should run on every row in the table, calling the openai embedding UDF we previously defined. On 400 rows, it should take around 2 minutes to complete." + "With the embedding function in place, we can now use it to generate and store embeddings in our table via SQL. The query below runs on every row in the table, calling the OpenAI embedding UDF we defined earlier. On a dataset of about 400 rows, it typically completes in around 2 minutes." ] }, { @@ -273,12 +273,11 @@ "id": "35ee611a", "metadata": {}, "source": [ - "Now that we have embeddings for each paper, we can use them to perform a semantic similarity search. \n", + "Now that we have embeddings for each paper, we can use them to perform a semantic similarity search.\n", "\n", - "To do this, we can use an array distance function native to DuckDB such as array_cosine_similarity that computes the cosine similarity between two vectors.\n", + "To achieve this, we can use a native DuckDB array distance function such as `array_cosine_similarity`, which computes the cosine similarity between two vectors.\n", "\n", - "Below we define a query that uses our embed_openai function to generate an embedding for a query, and then uses the array_cosine_similarity function to compute the similarity between the query embedding and each of the paper embeddings.\n", - "\n" + "The query below demonstrates how to generate an embedding for a search term using our `embed_openai` function, and then apply array_cosine_similarity to compare the query embedding with each of the paper embeddings." ] }, { @@ -402,11 +401,11 @@ "id": "8fb6dbd9", "metadata": {}, "source": [ - "While the above search query works fine on 400 rows, it can eventually get much slower as the dataset grows into hundreds of thousands. Without an index, DuckDB will compare a query embedding with all document embeddings to find the most similar one.\n", + "While the search query above works well on a dataset of 400 rows, it will become much slower as the data grows into hundreds of thousands of rows. Without an index, DuckDB must compare the query embedding against all document embeddings to find the most similar results.\n", "\n", - "In order to speed up the vector search, we can use ANN (Approximate Nearest Neighbor) with [HNSW (Hierarchical Navigable Small World)](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world), supported via DuckDB's vector [similarity search extension](https://duckdb.org/2024/05/03/vector-similarity-search-vss.html).\n", + "To speed up vector search, we can use ANN (Approximate Nearest Neighbor) with [HNSW (Hierarchical Navigable Small World)](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world), available through DuckDB’s [vector similarity search extension](https://duckdb.org/2024/05/03/vector-similarity-search-vss.html).\n", "\n", - "Let's try that out." + "Let’s give it a try." ] }, { @@ -560,7 +559,7 @@ "source": [ "## Conclusion\n", "\n", - "In this cookbook, we explored how to integrate OpenAI’s embedding calls as a reusable UDF in DuckDB. This approach proves especially powerful when you want to store and query embeddings directly alongside your data. By doing so, you unlock new opportunities for combining advanced data analysis with retrieval tasks, all through DuckDB’s simple and familiar SQL interface." + "In this cookbook, we explored how to integrate OpenAI’s embedding calls as a reusable UDF in DuckDB. This approach is especially powerful when storing and querying embeddings directly alongside your data. By combining embeddings with DuckDB’s familiar SQL interface, you unlock new possibilities for advanced data analysis and retrieval—all within a simple, efficient workflow." ] } ],