RAG-ready pipelines your PDF preprocessing, chunking, vectorisation and storage to Pinecone. Go from documents to a rag ready pinecone index in one go. Be it for search applications, chatbot development or recommendation systems - RAG-ready simplifies it all.
- Python 3.6+
.envfile with the following keys:JINA_API_KEYPINECONE_API_KEYPINECONE_INDEX_NAME
-
Clone the repository:
git clone https://github.com/FazlOmar9/RAG-ready.git cd RAG-ready -
Create a virtual environment:
python -m venv .venv
-
Activate the virtual environment:
-
On Windows:
.venv\Scripts\activate
-
On macOS/Linux:
source .venv/bin/activate
-
-
Install the required packages:
pip install -r requirements.txt
-
Create a .env file in the root directory with the following content:
JINA_API_KEY=your_jina_api_key PINECONE_API_KEY=your_pinecone_api_key PINECONE_INDEX_NAME=your_pinecone_index_name
-
Place your PDF file in the documents directory.
-
Run the main script with the path to your PDF file as an argument:
python main.py documents/your_pdf_file.pdf
This will extract text from the PDF, segment it, generate embeddings, and upload the vectors to Pinecone.