Concurrent Brainstorming & Hypothesis Satisfying: An Iterative Framework for Enhanced Retrieval-Augmented Generation (R2CBR3H-SR)
The implementation of: ``Concurrent Brainstorming & Hypothesis Satisfying: An Iterative Framework for Enhanced Retrieval-Augmented Generation (R2CBR3H-SR)''. The link to the arxiv paper will be provided soon.
This study presents a novel method for enhancing question-answering systems with retrieval-augmented generation capabilities. Our approach incorporates a cyclical process that melds cutting-edge developments in natural language processing with sophisticated information retrieval techniques. The core elements of our architecture include:
-
Concurrent Brainstorming: In this initial phase, we utilize the input question to simultaneously provoke the creation of semantically akin queries using a concurrent search across a document vector database.
-
Iterative Reranking: After the brainstorming period, the leading vector database results are selectively reranked, setting the stage for the upcoming formulation phase.
-
Hybrid Hypothesize-Satisfying: Employing a chain-of-thought prompting strategy, this phase combines forming hypotheses with the notion of satisfying, aiming to fulfill the user's quest for knowledge efficiently.
-
Refinement: The concluding phase distills the brainstormed ideas and queries into a refined, compact form that focuses on information-rich content over excess elaboration.
Install all the necessary requirements.
pip install -r requirements.txt
Create a .env file and set your openai
and cohere
API keys.
You can get the .txt documents by running the following command.
wget -q https://www.dropbox.com/s/vs6ocyvpzzncvwh/new_articles.zip
Then, unzip the downloaded documents and put them in the doc
directory.
unzip -q new_articles.zip -d doc
We have already provided the doc
directory, so you do not need to download it again.
Set the appropriate parameters according to your need in settings.py
. Make sure that the persist_dir
in settings.py
is set to "db" if you are not going to use docker later on, otherwise set it to "/data/".
To ingest the documents, run the following command.
python data_ingestion.py
We have already provided the sets of questions and answers for you in the qa_db
to be used for evaluating the performance of the proposed method and comparing with the baseline.
To interact with the ingested documents using the proposed method, run the following command.
python main_proposed.py
To interact with the ingested documents using the baseline method, run the following command.
python main_baseline.py
To use the docker compose for interacting with your documents, use the following command to build, (re)create, start, and attach to containers for a service in detached mode.
docker compose up -d
To follow log output(s) from containers use the following command.
docker compose logs -f
To stop and remove container(s), network(s), volume(s), and image(s) created by up
, use the following command.
docker compose down -v
To analyze the performance and compare the results with the baseline according to the results obtained by the simulations and saved in the directories results
and results_baseline
, run the following.
python results_post_processing.py
Arash Shahmansoori (arash.mansoori65@gmail.com)
This project is licensed under the terms of the MIT License