Skip to content

Latest commit

 

History

History
200 lines (171 loc) · 9.26 KB

ai-rag.md

File metadata and controls

200 lines (171 loc) · 9.26 KB
title keywords description
ai-rag
Apache APISIX
API Gateway
Plugin
ai-rag
This document contains information about the Apache APISIX ai-rag Plugin.

Description

The ai-rag plugin integrates Retrieval-Augmented Generation (RAG) capabilities with AI models. It allows efficient retrieval of relevant documents or information from external data sources and augments the LLM responses with that data, improving the accuracy and context of generated outputs.

As of now only Azure OpenAI and Azure AI Search services are supported for generating embeddings and performing vector search respectively. PRs for introducing support for other service providers are welcomed.

Plugin Attributes

Field Required Type Description
embeddings_provider Yes object Configurations of the embedding models provider
embeddings_provider.azure_openai Yes object Configurations of Azure OpenAI as the embedding models provider.
embeddings_provider.azure_openai.endpoint Yes string Azure OpenAI endpoint
embeddings_provider.azure_openai.api_key Yes string Azure OpenAI API key
vector_search_provider Yes object Configuration for the vector search provider
vector_search_provider.azure_ai_search Yes object Configuration for Azure AI Search
vector_search_provider.azure_ai_search.endpoint Yes string Azure AI Search endpoint
vector_search_provider.azure_ai_search.api_key Yes string Azure AI Search API key

Request Body Format

The following fields must be present in the request body.

Field Type Description
ai_rag object Configuration for AI-RAG (Retrieval Augmented Generation)
ai_rag.embeddings object Request parameters required to generate embeddings. Contents will depend on the API specification of the configured provider.
ai_rag.vector_search object Request parameters required to perform vector search. Contents will depend on the API specification of the configured provider.
  • Parameters of ai_rag.embeddings

    • Azure OpenAI
    Name Required Type Description
    input Yes string Input text used to compute embeddings, encoded as a string.
    user No string A unique identifier representing your end-user, which can help in monitoring and detecting abuse.
    encoding_format No string The format to return the embeddings in. Can be either float or base64. Defaults to float.
    dimensions No integer The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.

For other parameters please refer to the Azure OpenAI embeddings documentation.

  • Parameters of ai_rag.vector_search

    • Azure AI Search
    Field Required Type Description
    fields Yes String Fields for the vector search

    For other parameters please refer the Azure AI Search documentation.

Example request body:

{
  "ai_rag": {
    "vector_search": { "fields": "contentVector" },
    "embeddings": {
      "input": "which service is good for devops",
      "dimensions": 1024
    }
  }
}

Example usage

First initialise these shell variables:

ADMIN_API_KEY=edd1c9f034335f136f87ad84b625c8f1
AZURE_OPENAI_ENDPOINT=https://name.openai.azure.com/openai/deployments/gpt-4o/chat/completions
VECTOR_SEARCH_ENDPOINT=https://name.search.windows.net/indexes/indexname/docs/search?api-version=2024-07-01
EMBEDDINGS_ENDPOINT=https://name.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15
EMBEDDINGS_KEY=secret-azure-openai-embeddings-key
SEARCH_KEY=secret-azureai-search-key
AZURE_OPENAI_KEY=secret-azure-openai-key

Create a route with the ai-rag and ai-proxy plugin like so:

curl "http://127.0.0.1:9180/apisix/admin/routes/1" -X PUT \
  -H "X-API-KEY: ${ADMIN_API_KEY}" \
  -d '{
  "uri": "/rag",
  "plugins": {
    "ai-rag": {
      "embeddings_provider": {
        "azure_openai": {
          "endpoint": "'"$EMBEDDINGS_ENDPOINT"'",
          "api_key": "'"$EMBEDDINGS_KEY"'"
        }
      },
      "vector_search_provider": {
        "azure_ai_search": {
          "endpoint": "'"$VECTOR_SEARCH_ENDPOINT"'",
          "api_key": "'"$SEARCH_KEY"'"
        }
      }
    },
    "ai-proxy": {
      "auth": {
        "header": {
          "api-key": "'"$AZURE_OPENAI_KEY"'"
        },
        "query": {
          "api-version": "2023-03-15-preview"
         }
      },
      "model": {
        "provider": "openai",
        "name": "gpt-4",
        "options": {
          "max_tokens": 512,
          "temperature": 1.0
        }
      },
      "override": {
        "endpoint": "'"$AZURE_OPENAI_ENDPOINT"'"
      }
    }
  },
  "upstream": {
    "type": "roundrobin",
    "nodes": {
      "someupstream.com:443": 1
    },
    "scheme": "https",
    "pass_host": "node"
  }
}'

The ai-proxy plugin is used here as it simplifies access to LLMs. Alternatively, you may configure the LLM service address in the upstream configuration and update the route URI as well.

Now send a request:

curl http://127.0.0.1:9080/rag -XPOST  -H 'Content-Type: application/json' -d '{"ai_rag":{"vector_search":{"fields":"contentVector"},"embeddings":{"input":"which service is good for devops","dimensions":1024}}}'

You will receive a response like this:

{
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "message": {
        "content": "Here are the details for some of the services you inquired about from your Azure search context:\n\n ... <rest of the response>",
        "role": "assistant"
      }
    }
  ],
  "created": 1727079764,
  "id": "chatcmpl-AAYdA40YjOaeIHfgFBkaHkUFCWxfc",
  "model": "gpt-4o-2024-05-13",
  "object": "chat.completion",
  "system_fingerprint": "fp_67802d9a6d",
  "usage": {
    "completion_tokens": 512,
    "prompt_tokens": 6560,
    "total_tokens": 7072
  }
}