Skip to content

Commit 7f71325

Browse files
authored
Merge pull request #365 from are-ces/rag_documentation
LCORE-314: Documentation for configuring RAG
2 parents 2cc494c + 1dba836 commit 7f71325

File tree

1 file changed

+236
-0
lines changed

1 file changed

+236
-0
lines changed

docs/rag_guide.md

Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
# RAG Configuration Guide
2+
3+
This document explains how to configure and customize your RAG pipeline using the `llama-stack` configuration YAML file. You will:
4+
5+
* Initialize a vector store
6+
* Download and point to a local embedding model
7+
* Configure an inference provider (LLM)
8+
* Enable Agent-based RAG querying
9+
10+
---
11+
12+
## Table of Contents
13+
14+
* [Introduction](#introduction)
15+
* [Prerequisites](#prerequisites)
16+
* [Set Up the Vector Database](#set-up-the-vector-database)
17+
* [Download an Embedding Model](#download-an-embedding-model)
18+
* [Configure Vector Store and Embedding Model](#configure-vector-store-and-embedding-model)
19+
* [Add an Inference Model (LLM)](#add-an-inference-model-llm)
20+
* [Complete Configuration Reference](#complete-configuration-reference)
21+
* [References](#references)
22+
23+
24+
---
25+
26+
# Introduction
27+
28+
RAG in Lightspeed Core Stack (LCS) is yet only supported via the Agents API. The agent is responsible for planning and deciding when to query the vector index.
29+
30+
The system operates a chain of command. The **Agent** is the orchestrator, using the LLM as its reasoning engine. When a plan requires external information, the Agent queries the **Vector Store**. This is your database of indexed knowledge, which you are responsible for creating before running the stack. The **Embedding Model** is used to convert the queries to vectors.
31+
32+
> [!NOTE]
33+
> The same Embedding Model should be used to both create the store and to query it.
34+
35+
---
36+
37+
# Prerequisites
38+
39+
## Set Up the Vector Database
40+
41+
Use the [`rag-content`](https://github.com/lightspeed-core/rag-content) repository to build a compatible vector database.
42+
43+
> [!IMPORTANT]
44+
> The resulting DB must be compatible with Llama Stack (e.g., FAISS with SQLite metadata, SQLite-vec). This can be configured when using the tool to generate the index.
45+
46+
---
47+
48+
## Download an Embedding Model
49+
50+
Download a local embedding model such as `sentence-transformers/all-mpnet-base-v2` by using the script in [`rag-content`](https://github.com/lightspeed-core/rag-content) or manually download and place in your desired path.
51+
52+
> [!NOTE]
53+
> Llama Stack can also download a model for you, which will make the first start-up slower. In the YAML configuration file `run.yaml` specify a supported model name as `provider_model_id` instead of a path. LLama Stack will then download the model to the `~/.cache/huggingface/hub` folder.
54+
55+
---
56+
57+
## Configure Vector Store and Embedding Model
58+
59+
Update the `run.yaml` file used by Llama Stack to point to:
60+
61+
* Your downloaded **embedding model**
62+
* Your generated **vector database**
63+
64+
Example:
65+
66+
```yaml
67+
models:
68+
- model_id: <embedding-model-name> # e.g. sentence-transformers/all-mpnet-base-v2
69+
metadata:
70+
embedding_dimension: <embedding-dimension> # e.g. 768
71+
model_type: embedding
72+
provider_id: sentence-transformers
73+
provider_model_id: <path-to-embedding-model> # e.g. /home/USER/embedding_model
74+
75+
providers:
76+
inference:
77+
- provider_id: sentence-transformers
78+
provider_type: inline::sentence-transformers
79+
config: {}
80+
81+
# FAISS vector store
82+
vector_io:
83+
- provider_id: custom-index
84+
provider_type: inline::faiss
85+
config:
86+
kvstore:
87+
type: sqlite
88+
db_path: <path-to-vector-index> # e.g. /home/USER/vector_db/faiss_store.db
89+
namespace: null
90+
91+
vector_dbs:
92+
- embedding_dimension: <embedding-dimension> # e.g. 768
93+
embedding_model: <embedding-model-name> # e.g. sentence-transformers/all-mpnet-base-v2
94+
provider_id: custom-index
95+
vector_db_id: <index-id>
96+
```
97+
98+
Where:
99+
- `provider_model_id` is the path to the folder of the embedding model (or alternatively, the supported embedding model to download)
100+
- `db_path` is the path to the vector index (.db file in this case)
101+
- `vector_db_id` is the index ID used to generate the db
102+
103+
---
104+
105+
## Add an Inference Model (LLM)
106+
107+
Add a provider for your language model (e.g., OpenAI):
108+
109+
```yaml
110+
models:
111+
[...]
112+
- model_id: my-model
113+
provider_id: openai
114+
model_type: llm
115+
provider_model_id: <model-name> # e.g. gpt-4o-mini
116+
117+
providers:
118+
[...]
119+
inference:
120+
- provider_id: openai
121+
provider_type: remote::openai
122+
config:
123+
api_key: ${env.OPENAI_API_KEY}
124+
```
125+
126+
Make sure to export your API key:
127+
128+
```bash
129+
export OPENAI_API_KEY=<your-key-here>
130+
```
131+
132+
> [!NOTE]
133+
> When experimenting with different `models`, `providers` and `vector_dbs`, you might need to manually unregister the old ones with the Llama Stack client CLI (e.g. `llama-stack-client vector_dbs list`)
134+
135+
136+
---
137+
138+
# Complete Configuration Reference
139+
140+
To enable RAG functionality, make sure the `agents`, `tool_runtime`, and `safety` APIs are included and properly configured in your YAML.
141+
142+
Below is a real example of a working config, with:
143+
144+
* A local `all-mpnet-base-v2` embedding model
145+
* A `FAISS`-based vector store
146+
* `OpenAI` as the inference provider
147+
* Agent-based RAG setup
148+
149+
> [!TIP]
150+
> We recommend starting with a minimal working configuration (one is automatically generated by the `rag-content` tool when generating the database) and extending it as needed by adding more APIs and providers.
151+
152+
```yaml
153+
version: 2
154+
image_name: rag-configuration
155+
156+
apis:
157+
- agents
158+
- inference
159+
- vector_io
160+
- tool_runtime
161+
- safety
162+
163+
models:
164+
- model_id: gpt-test
165+
provider_id: openai # This ID is a reference to 'providers.inference'
166+
model_type: llm
167+
provider_model_id: gpt-4o-mini
168+
169+
- model_id: sentence-transformers/all-mpnet-base-v2
170+
metadata:
171+
embedding_dimension: 768
172+
model_type: embedding
173+
provider_id: sentence-transformers # This ID is a reference to 'providers.inference'
174+
provider_model_id: /home/USER/lightspeed-stack/embedding_models/all-mpnet-base-v2
175+
176+
providers:
177+
inference:
178+
- provider_id: sentence-transformers
179+
provider_type: inline::sentence-transformers
180+
config: {}
181+
182+
- provider_id: openai
183+
provider_type: remote::openai
184+
config:
185+
api_key: ${env.OPENAI_API_KEY}
186+
187+
agents:
188+
- provider_id: meta-reference
189+
provider_type: inline::meta-reference
190+
config:
191+
persistence_store:
192+
type: sqlite
193+
db_path: .llama/distributions/ollama/agents_store.db
194+
responses_store:
195+
type: sqlite
196+
db_path: .llama/distributions/ollama/responses_store.db
197+
198+
safety:
199+
- provider_id: llama-guard
200+
provider_type: inline::llama-guard
201+
config:
202+
excluded_categories: []
203+
204+
vector_io:
205+
- provider_id: ocp-docs
206+
provider_type: inline::faiss
207+
config:
208+
kvstore:
209+
type: sqlite
210+
db_path: /home/USER/lightspeed-stack/vector_dbs/ocp_docs/faiss_store.db
211+
namespace: null
212+
213+
tool_runtime:
214+
- provider_id: rag-runtime
215+
provider_type: inline::rag-runtime
216+
config: {}
217+
218+
# Enable the RAG tool
219+
tool_groups:
220+
- provider_id: rag-runtime
221+
toolgroup_id: builtin::rag
222+
args: null
223+
mcp_endpoint: null
224+
225+
vector_dbs:
226+
- embedding_dimension: 768
227+
embedding_model: sentence-transformers/all-mpnet-base-v2
228+
provider_id: ocp-docs # This ID is a reference to 'providers.vector_io'
229+
vector_db_id: openshift-index # This ID was defined during index generation
230+
```
231+
232+
# References
233+
234+
* [Llama Stack - RAG](https://llama-stack.readthedocs.io/en/latest/building_applications/rag.html)
235+
* [Llama Stack - Configuring a “Stack"](https://llama-stack.readthedocs.io/en/latest/distributions/configuration.html)
236+
* [Llama Stack - Sample configurations](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/distributions)

0 commit comments

Comments
 (0)