RAGbot is a simple Python app designed to provide automated responses to frequently asked questions and is easy to setup as an endpoint for a Slackbot or any other system that supports a simple POST out to an endpoint. It uses Retreival Augmented Generation (RAG) via text extracted from a .docx document as a knowledge base to generate relevant answers using OpenAI's language models.
- Extracts information from
.docxfiles to create an embeddings.csvto serve as a knowledge base. - Utilizes OpenAI's APIs to understand and respond to queries utilizing the extracted knowledge base for your specific use case.
- Provides an API endpoint for Slack or other apps to POST questions and receive answers.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Python 3.8 or higher
- OpenAI API key
- Slack workspace for integration
Clone the repository to your local machine:
git clone https://github.com/yourusername/RAGbot.git
cd RAGbotSet up a virtual environment and install the dependencies:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtExport your OpenAI API key as an environment variable:
export OPENAI_API_KEY='your_api_key_here'-
Prepare your
.docxdocument with the source material you want the bot to refernce, using Header 1 formatting to separate sections. -
Generate the embeddings
.csvfile from your.docxdocument usingdocparser.py. This is what the bot uses to decide what sections are relevant to the query -
Start the server:
- Test the API with a
curlcommand:
curl -X POST 'http://localhost:4433/ask' -H "Content-Type: application/json" --data '{"query": "What encryption is used on the company's platform?"}'{"response":"The platform uses TLS 1.3 or 1.2 for encryption in transit."}- Configure Slack to POST queries to the
/askendpoint.
It can be helpful to ensure that the source .docx you use to provide the info references the company/platform name when describing it's features so that it's more clear as context what'ss being referred to. i.e. "Company X uses SuperAntiVirus to prevent untreatable infections" is better than "We use SuperAntiVirus...."
You can make this even more explicit by updating the context prompt that's sent to ChatGPT in the query_message function to spell out the company/platform name to help it make the association.
This could all be made dynamic by creating embeddings from a GDoc on a regular schedule so embeddings.csv always contains the latest updates and can have shared editing.