title | keywords | description | ||||
---|---|---|---|---|---|---|
ai-rag |
|
This document contains information about the Apache APISIX ai-rag Plugin. |
The ai-rag
plugin integrates Retrieval-Augmented Generation (RAG) capabilities with AI models.
It allows efficient retrieval of relevant documents or information from external data sources and
augments the LLM responses with that data, improving the accuracy and context of generated outputs.
As of now only Azure OpenAI and Azure AI Search services are supported for generating embeddings and performing vector search respectively. PRs for introducing support for other service providers are welcomed.
Field | Required | Type | Description |
---|---|---|---|
embeddings_provider | Yes | object | Configurations of the embedding models provider |
embeddings_provider.azure_openai | Yes | object | Configurations of Azure OpenAI as the embedding models provider. |
embeddings_provider.azure_openai.endpoint | Yes | string | Azure OpenAI endpoint |
embeddings_provider.azure_openai.api_key | Yes | string | Azure OpenAI API key |
vector_search_provider | Yes | object | Configuration for the vector search provider |
vector_search_provider.azure_ai_search | Yes | object | Configuration for Azure AI Search |
vector_search_provider.azure_ai_search.endpoint | Yes | string | Azure AI Search endpoint |
vector_search_provider.azure_ai_search.api_key | Yes | string | Azure AI Search API key |
The following fields must be present in the request body.
Field | Type | Description |
---|---|---|
ai_rag | object | Configuration for AI-RAG (Retrieval Augmented Generation) |
ai_rag.embeddings | object | Request parameters required to generate embeddings. Contents will depend on the API specification of the configured provider. |
ai_rag.vector_search | object | Request parameters required to perform vector search. Contents will depend on the API specification of the configured provider. |
-
Parameters of
ai_rag.embeddings
- Azure OpenAI
Name Required Type Description input Yes string Input text used to compute embeddings, encoded as a string. user No string A unique identifier representing your end-user, which can help in monitoring and detecting abuse. encoding_format No string The format to return the embeddings in. Can be either float
orbase64
. Defaults tofloat
.dimensions No integer The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
For other parameters please refer to the Azure OpenAI embeddings documentation.
-
Parameters of
ai_rag.vector_search
- Azure AI Search
Field Required Type Description fields Yes String Fields for the vector search For other parameters please refer the Azure AI Search documentation.
Example request body:
{
"ai_rag": {
"vector_search": { "fields": "contentVector" },
"embeddings": {
"input": "which service is good for devops",
"dimensions": 1024
}
}
}
First initialise these shell variables:
ADMIN_API_KEY=edd1c9f034335f136f87ad84b625c8f1
AZURE_OPENAI_ENDPOINT=https://name.openai.azure.com/openai/deployments/gpt-4o/chat/completions
VECTOR_SEARCH_ENDPOINT=https://name.search.windows.net/indexes/indexname/docs/search?api-version=2024-07-01
EMBEDDINGS_ENDPOINT=https://name.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15
EMBEDDINGS_KEY=secret-azure-openai-embeddings-key
SEARCH_KEY=secret-azureai-search-key
AZURE_OPENAI_KEY=secret-azure-openai-key
Create a route with the ai-rag
and ai-proxy
plugin like so:
curl "http://127.0.0.1:9180/apisix/admin/routes/1" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"uri": "/rag",
"plugins": {
"ai-rag": {
"embeddings_provider": {
"azure_openai": {
"endpoint": "'"$EMBEDDINGS_ENDPOINT"'",
"api_key": "'"$EMBEDDINGS_KEY"'"
}
},
"vector_search_provider": {
"azure_ai_search": {
"endpoint": "'"$VECTOR_SEARCH_ENDPOINT"'",
"api_key": "'"$SEARCH_KEY"'"
}
}
},
"ai-proxy": {
"auth": {
"header": {
"api-key": "'"$AZURE_OPENAI_KEY"'"
},
"query": {
"api-version": "2023-03-15-preview"
}
},
"model": {
"provider": "openai",
"name": "gpt-4",
"options": {
"max_tokens": 512,
"temperature": 1.0
}
},
"override": {
"endpoint": "'"$AZURE_OPENAI_ENDPOINT"'"
}
}
},
"upstream": {
"type": "roundrobin",
"nodes": {
"someupstream.com:443": 1
},
"scheme": "https",
"pass_host": "node"
}
}'
The ai-proxy
plugin is used here as it simplifies access to LLMs. Alternatively, you may configure the LLM service address in the upstream configuration and update the route URI as well.
Now send a request:
curl http://127.0.0.1:9080/rag -XPOST -H 'Content-Type: application/json' -d '{"ai_rag":{"vector_search":{"fields":"contentVector"},"embeddings":{"input":"which service is good for devops","dimensions":1024}}}'
You will receive a response like this:
{
"choices": [
{
"finish_reason": "length",
"index": 0,
"message": {
"content": "Here are the details for some of the services you inquired about from your Azure search context:\n\n ... <rest of the response>",
"role": "assistant"
}
}
],
"created": 1727079764,
"id": "chatcmpl-AAYdA40YjOaeIHfgFBkaHkUFCWxfc",
"model": "gpt-4o-2024-05-13",
"object": "chat.completion",
"system_fingerprint": "fp_67802d9a6d",
"usage": {
"completion_tokens": 512,
"prompt_tokens": 6560,
"total_tokens": 7072
}
}