This repository was created based on the codes found in the openshift/lighspeed-rag-content for building a RAG (Retrieval-Augmented Generation) vector database, which is used with the Ansible Automation Platform (AAP) chatbot from the documentation sources stored in the ansible/aap-docs repository.
make install-tools
make install-deps
make install-deps-test
Currently, aap-rag-content images are built with Gitlab.cee aap-rag-content repository, which references this repository as a git submodule. However, If you need to build an image manually from this repository, use the following steps.
-
Obtain the access to the Mimir repository and clone the repository.
-
Create the
./mimir
folder in the project root. -
Copy
mimir-extract-latest.tgz.enc
file to./mimir
-
Run
./scripts/mimir-parser.py
, which will extract markdown files in./aap-product-docs-plaintext
folder../scripts/mimir-parser.py
make build-image-aap
podman login quay.io
podman push aap-rag-content quay.io/ansible/aap-rag-content
By default, Faiss Vector Store is used for saving embeddings and the result is included in container images. You can also use Postgresql database as the vector store with its PGVector extension.
make start-postgres-debug
The data
directory of Postgres is created under ./postgresql/data
.
make generate-embeddings-postgres
The result is saved in the data_aap_product_docs_2_5
table.
$ podman exec -it pgvector bash
root@7894ab5c94e2:/# psql -U postgres
psql (16.4 (Debian 16.4-1.pgdg120+2))
Type "help" for help.
postgres=# \dt
List of relations
Schema | Name | Type | Owner
--------+---------------------------+-------+----------
public | data_aap_product_docs_2_5 | table | postgres
(1 row)
postgres=#