-
Notifications
You must be signed in to change notification settings - Fork 760
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* docs: add ai function * add AI functions to readme
- Loading branch information
Showing
6 changed files
with
248 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
53 changes: 53 additions & 0 deletions
53
docs/doc/15-sql-functions/61-ai-functions/02-ai-embedding-vector.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
--- | ||
title: 'AI_EMBEDDING_VECTOR' | ||
description: 'Creating embeddings using the ai_embedding_vector function in Databend' | ||
--- | ||
|
||
This document provides an overview of the ai_embedding_vector function in Databend and demonstrates how to create document embeddings using this function. | ||
|
||
## Overview of ai_embedding_vector | ||
|
||
|
||
The `ai_embedding_vector` function in Databend is a built-in function that generates vector embeddings for text data. It is useful for natural language processing tasks, such as document similarity, clustering, and recommendation systems. | ||
|
||
The function takes a text input and returns a high-dimensional vector that represents the input text's semantic meaning and context. The embeddings are created using pre-trained models on large text corpora, capturing the relationships between words and phrases in a continuous space. | ||
|
||
## Creating embeddings using ai_embedding_vector | ||
|
||
To create embeddings for a text document using the `ai_embedding_vector` function, follow the example below. | ||
1. Create a table to store the documents: | ||
```sql | ||
CREATE TABLE documents ( | ||
doc_id INT, | ||
text_content TEXT | ||
); | ||
``` | ||
|
||
2. Insert example documents into the table: | ||
```sql | ||
INSERT INTO documents (doc_id, text_content) | ||
VALUES | ||
(1, 'Artificial intelligence is a fascinating field.'), | ||
(2, 'Machine learning is a subset of AI.'), | ||
(3, 'I love going to the beach on weekends.'); | ||
``` | ||
|
||
3. Create a table to store the embeddings: | ||
```sql | ||
CREATE TABLE embeddings ( | ||
doc_id INT, | ||
text_content TEXT, | ||
embedding ARRAY(FLOAT32) | ||
); | ||
``` | ||
|
||
4. Generate embeddings for the text content and store them in the embeddings table: | ||
```sql | ||
INSERT INTO embeddings (doc_id, text_content, embedding) | ||
SELECT doc_id, text_content, ai_embedding_vector(text_content) | ||
FROM documents; | ||
|
||
``` | ||
After running these SQL queries, the embeddings table will contain the generated embeddings for each document in the documents table. The embeddings are stored as an array of `FLOAT32` values in the embedding column, which has the `ARRAY(FLOAT32)` column type. | ||
|
||
You can now use these embeddings for various natural language processing tasks, such as finding similar documents or clustering documents based on their content. |
55 changes: 55 additions & 0 deletions
55
docs/doc/15-sql-functions/61-ai-functions/03-ai-cosine-distance.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
--- | ||
title: 'COSINE_DISTANCE' | ||
description: 'Measuring document similarity using the cosine_distance function in Databend' | ||
--- | ||
|
||
This document provides an overview of the `cosine_distance` function in Databend and demonstrates how to measure document similarity using this function. | ||
|
||
## Overview of cosine_distance | ||
|
||
The `cosine_distance` function in Databend is a built-in function that calculates the cosine distance between two vectors. It is commonly used in natural language processing tasks, such as document similarity and recommendation systems. | ||
|
||
Cosine distance is a measure of similarity between two vectors, based on the cosine of the angle between them. The function takes two input vectors and returns a value between 0 and 1, with 0 indicating identical vectors and 1 indicating orthogonal (completely dissimilar) vectors. | ||
|
||
## Measuring similarity using cosine_distance | ||
|
||
To measure document similarity using the cosine_distance function, follow the example below. This example assumes that you have already created document embeddings using the ai_embedding_vector function and stored them in a table with the `ARRAY(FLOAT32)` column type. | ||
|
||
1. Create a table to store the documents and their embeddings: | ||
```sql | ||
CREATE TABLE documents ( | ||
doc_id INT, | ||
text_content TEXT, | ||
embedding ARRAY(FLOAT32) | ||
); | ||
|
||
``` | ||
|
||
2. Insert example documents and their embeddings into the table: | ||
```sql | ||
INSERT INTO documents (doc_id, text_content, embedding) | ||
VALUES | ||
(1, 'Artificial intelligence is a fascinating field.', ai_embedding_vector('Artificial intelligence is a fascinating field.')), | ||
(2, 'Machine learning is a subset of AI.', ai_embedding_vector('Machine learning is a subset of AI.')), | ||
(3, 'I love going to the beach on weekends.', ai_embedding_vector('I love going to the beach on weekends.')); | ||
``` | ||
|
||
3. Measure the similarity between a query document and the stored documents using the `cosine_distance` function: | ||
```sql | ||
SELECT doc_id, text_content, cosine_distance(embedding, ai_embedding_vector('What is a subfield of artificial intelligence?')) AS distance | ||
FROM embeddings | ||
ORDER BY distance ASC | ||
LIMIT 5; | ||
``` | ||
This SQL query calculates the cosine distance between the query document's embedding and the embeddings of the stored documents. The results are ordered by ascending distance, with the smallest distance indicating the highest similarity. | ||
|
||
Result: | ||
```sql | ||
+--------+-------------------------------------------------+------------+ | ||
| doc_id | text_content | distance | | ||
+--------+-------------------------------------------------+------------+ | ||
| 1 | Artificial intelligence is a fascinating field. | 0.10928339 | | ||
| 2 | Machine learning is a subset of AI. | 0.13584924 | | ||
| 3 | I love going to the beach on weekends. | 0.30774158 | | ||
+--------+-------------------------------------------------+------------+ | ||
``` |
30 changes: 30 additions & 0 deletions
30
docs/doc/15-sql-functions/61-ai-functions/04-ai-text-completion.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
--- | ||
title: 'AI_TEXT_COMPLETION' | ||
description: 'Generating text completions using the ai_text_completion function in Databend' | ||
--- | ||
|
||
This document provides an overview of the ai_text_completion function in Databend and demonstrates how to generate text completions using this function. | ||
|
||
## Overview of ai_text_completion | ||
|
||
The `ai_text_completion` function in Databend is a built-in function that generates text completions based on a given prompt. It is useful for natural language processing tasks, such as question answering, text generation, and autocompletion systems. | ||
|
||
The function takes a text prompt as input and returns a generated completion for the prompt. The completions are created using pre-trained models on large text corpora, capturing the relationships between words and phrases in a continuous space. | ||
|
||
## Generating text completions using ai_text_completion | ||
|
||
Here is a simple example using the `ai_text_completion` function in Databend to generate a text completion: | ||
```sql | ||
SELECT ai_text_completion('What is artificial intelligence?') AS completion; | ||
``` | ||
|
||
Result: | ||
```sql | ||
+--------------------------------------------------------------------------------------------------------------------+ | ||
| completion | | ||
+--------------------------------------------------------------------------------------------------------------------+ | ||
| Artificial intelligence (AI) is the field of study focused on creating machines and software capable of thinking, learning, and solving problems in a way that mimics human intelligence. This includes areas such as machine learning, natural language processing, computer vision, and robotics. | | ||
+--------------------------------------------------------------------------------------------------------------------+ | ||
``` | ||
|
||
In this example, we provide the prompt "What is artificial intelligence?" to the ai_text_completion function, and it returns a generated completion that briefly describes artificial intelligence. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,98 @@ | ||
--- | ||
title: 'AI Functions' | ||
description: 'Learn how to use AI functions in Databend with the help of the OpenAI engine.' | ||
description: 'SQL-based Knowledge Base Search and Completion using Databend' | ||
--- | ||
|
||
AI functions refer to the various capabilities within Databend that are powered by the [OpenAI](https://openai.com/) engine, and are designed to make it easier for users to interact with databases using natural language. | ||
This document demonstrates how to leverage Databend's built-in AI functions for creating document embeddings, searching for similar documents, and generating text completions based on context. | ||
|
||
- [AI_TO_SQL](01-ai-to-sql.md): Converts natural language instructions into SQL queries with the latest [Codex](https://openai.com/blog/openai-codex) model `code-davinci-002`. | ||
We will guide you through a simple example that shows how to create and store embeddings using the `ai_embedding_vector` function, find related documents with the `cosine_distance` function, and generate completions using the `ai_text_completion` function. | ||
|
||
## Introduction to embeddings | ||
|
||
Embeddings are vector representations of text data that capture the semantic meaning and context of the original text. They can be used to compare and analyze text in various natural language processing tasks, such as document similarity, clustering, and recommendation. | ||
|
||
## How do embeddings work? | ||
|
||
Embeddings work by converting text into high-dimensional vectors in such a way that similar texts are closer together in the vector space. This is achieved by training a model on a large corpus of text, which learns to represent the words and phrases in a continuous space that captures their semantic relationships. | ||
Embeddings are vector representations of text data that capture the semantic meaning and context of the original text. They are widely used in various natural language processing tasks, such as document similarity, clustering, and recommendation systems. | ||
|
||
## Databend AI Functions | ||
|
||
Databend provides built-in AI functions for various natural language processing tasks. The main functions covered in this document are: | ||
|
||
- [ai_embedding_vector](./02-ai-embedding-vector.md): Generates embeddings for text documents. | ||
- [cosine_distance](./03-ai-cosine-distance.md): Calculates the cosine distance between two embeddings. | ||
- [ai_text_completion](./04-ai-text-completion.md): Generates text completions based on a given prompt. | ||
These functions are powered by open-source natural language processing models and can be used directly within SQL queries. | ||
|
||
## Creating and storing embeddings using Databend | ||
|
||
To create embeddings for a text document using Databend, you can use the built-in ai_embedding_vector function directly in your SQL query. Here's an example: | ||
|
||
```sql | ||
CREATE TABLE documents ( | ||
doc_id INT, | ||
text_content TEXT | ||
); | ||
|
||
INSERT INTO documents (doc_id, text_content) | ||
VALUES | ||
(1, 'Artificial intelligence is a fascinating field.'), | ||
(2, 'Machine learning is a subset of AI.'), | ||
(3, 'I love going to the beach on weekends.'); | ||
|
||
CREATE TABLE embeddings ( | ||
doc_id INT, | ||
text_content TEXT, | ||
embedding ARRAY(FLOAT32) | ||
); | ||
|
||
INSERT INTO embeddings (doc_id, text_content, embedding) | ||
SELECT doc_id, text_content, ai_embedding_vector(text_content) | ||
FROM documents; | ||
``` | ||
|
||
This SQL script creates a documents table, inserts the example documents, and then generates embeddings using the ai_embedding_vector function. The embeddings are stored in the embeddings table with the ARRAY(FLOAT32) column type. | ||
|
||
## Searching for related documents using cosine distance | ||
|
||
Suppose you have a question, "What is a subfield of artificial intelligence?", and you want to find the most related document from the stored embeddings. First, generate an embedding for the question using the ai_embedding_vector function: | ||
```sql | ||
SELECT doc_id, text_content, cosine_distance(embedding, ai_embedding_vector('What is a subfield of artificial intelligence?')) AS distance | ||
FROM embeddings | ||
ORDER BY distance ASC | ||
LIMIT 5; | ||
``` | ||
This query will return the top 5 most similar documents to the input question, ordered by their cosine distance, with the smallest distance indicating the highest similarity. | ||
|
||
Result: | ||
```sql | ||
+--------+-------------------------------------------------+------------+ | ||
| doc_id | text_content | distance | | ||
+--------+-------------------------------------------------+------------+ | ||
| 1 | Artificial intelligence is a fascinating field. | 0.10928339 | | ||
| 2 | Machine learning is a subset of AI. | 0.13584924 | | ||
| 3 | I love going to the beach on weekends. | 0.30774158 | | ||
+--------+-------------------------------------------------+------------+ | ||
``` | ||
|
||
## Generating text completions with Databend | ||
|
||
Databend also supports a text completion function, ai_text_completion. For example, from the above output, we choose the document with the smallest cosine distance: "Artificial intelligence is a fascinating field." We can use this as context and provide the original question to the ai_text_completion function to generate a completion: | ||
|
||
```sql | ||
SELECT ai_text_completion('Artificial intelligence is a fascinating field. What is a subfield of artificial intelligence?') AS completion; | ||
``` | ||
|
||
Result: | ||
```sql | ||
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ||
| completion | | ||
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ||
| | ||
| A subfield of artificial intelligence is machine learning, which is the study of algorithms that allow computers to learn from data and improve their performance over time. Other subfields include natural language processing, computer vision, robotics, and deep learning. | | ||
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ||
``` | ||
|
||
|
||
You can experience these functions on our [Databend Cloud](https://databend.com), where you can sign up for a free trial and start using these AI functions right away. Databend's AI functions are designed to be easy to use, even for users who are not familiar with machine learning or natural language processing. With Databend, you can quickly and easily add powerful AI capabilities to your SQL queries and take your data analysis to the next level. |
c7329bf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Successfully deployed to the following URLs:
databend – ./
databend-databend.vercel.app
databend.vercel.app
databend.rs
databend-git-main-databend.vercel.app