This is the sample implementation of RAG (Retrieval Augmented Generation) using Elasticsearch. NewsAPI contents are used, but you can easily modify to use other data sources.
- Docker: 24.0.5
- Docker Compose: v2.20.2
- OpenAPI information
- Elasticsearch cluster on Elastic Cloud
- This program assumes that Elasticsearch cluster is craeted on Elastic Cloud
- NewsAPI's API Key (Optional)
Add analysis-icu
and analysis-kuromoji
Follow this link https://www.elastic.co/guide/en/cloud/current/ec-adding-elastic-plugins.html
If you include esapi_key
, API Key access will be used even if you configure cloud_pass
and cloud_id
openai_api_key=<openapi key>
openai_api_type=azure
openai_api_base=<openapi base url>
openai_api_version=<openapi version>
openai_api_engine=<openapi engine>
cloud_id=<cloud id of Elasticsearch Cluster>
cloud_pass=<Cloud pass of Elasticsearch Cluster>
cloud_user=<Cloud User. Normally it is elastic>
search_index=<your index name>
newsapi_key=<newsapi key>
If you want to use Elasticsearch API Key, use the following .env
.
openai_api_key=<openapi key>
openai_api_type=azure
openai_api_base=<openapi base url>
openai_api_version=<openapi version>
openai_api_engine=<openapi engine>
cloud_id=<cloud id of Elasticsearch Cluster>
esapi_key=<Elasticsearch API Key>
search_index=<your index name>
newsapi_key=<newsapi key>
Change the version of elasticsearch and other components accordingly
docker compose up -d
This step will do the followings:
- Upload cl-tohoku/bert-base-japanese-v3 from Hugging Face
- Create the ingest pipeline to embed the vector
- Create the mapping
Enter Docker Container and execute initialize.sh
docker exec -it esre_flask /bin/bash
./initialize.sh
docker exec -it esre_flask /bin/bash
./initialize_api.sh
docker exec -it esre_flask /bin/bash
cd data
./load_all.sh
- Change --hub-model-id in initialize.sh
- Change model_id of ingest pipeline and text_embedding mapping accordingly in create_index.py
- Change knn and rrf query in app.py
- Modify url of newsapi.py if you want to get different topics (now get everything) For example
docker exec -it esre_flask /bin/bash
cd data
python newsapi.py コロナウイルス ./json/covid.json
load1.sh ./json/covid.json