PDF Document Processing and QA Bot

This project provides a Flask-based web service for uploading PDF documents, extracting text using OCR, and setting up a question-answering (QA) system using language models and embeddings. The service processes uploaded PDFs, stores embeddings in a Chroma database, and allows users to query the processed documents.

Features

PDF Upload: Upload PDF documents through a REST API endpoint.
Text Extraction: Extract text from PDF files using PaddleOCR.
Text Splitting: Split extracted text into manageable chunks.
Embedding: Convert text chunks into embeddings using HuggingFace's model.
Database Storage: Store embeddings in a Chroma vector database.
Question Answering: Query the processed documents using a custom QA chain.

Prerequisites

Python 3.7 or higher
Required Python packages (see requirements.txt)

Installation

Clone the repository:

git clone https://github.com/gautamraj8044/PDF-Document-Processing-and-QA-Bot

Navigate to the project directory:
```
cd Chat-Bot
```
Install the required Python packages:
```
pip install -r requirements.txt
```

Configuration

Set up Poppler: Download and install Poppler, and update the poppler_path variable in the code to point to your Poppler installation directory.
Configure Model Paths: Update the local_llm variable with the path to your local language model file.

Running the Application

Start the Flask server:
```
python app.py
```
Access the API:
- Upload a PDF: POST request to /upload with a file attachment.
- Ask a Question: POST request to /ask with the question in the form data.

API Endpoints

Upload PDF

Endpoint: /upload
Method: POST
Request: Form-data with a file attachment.
Response: JSON message indicating the status of the upload and processing.

Ask Question

Endpoint: /ask
Method: POST
Request: Form-data with the key query containing the question.
Response: JSON with the answer to the question.

Example Usage

Upload a PDF

curl -X POST http://localhost:5000/upload -F "file=@path_to_your_pdf.pdf"

Ask a Question

curl -X POST http://localhost:5000/ask -F "query=What is the main topic of the document?"

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Feel free to submit issues or pull requests. Please follow the project's coding style and guidelines.

Contact

For any questions or issues, please contact [gautamraj8044@gmail.com]

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Chat-Bot		Chat-Bot
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Document Processing and QA Bot

Features

Prerequisites

Installation

Configuration

Running the Application

API Endpoints

Upload PDF

Ask Question

Example Usage

Upload a PDF

Ask a Question

License

Contributing

Contact

About

Releases

Packages

Languages

gautamraj8044/PDF-Document-Processing-and-QA-Bot

Folders and files

Latest commit

History

Repository files navigation

PDF Document Processing and QA Bot

Features

Prerequisites

Installation

Configuration

Running the Application

API Endpoints

Upload PDF

Ask Question

Example Usage

Upload a PDF

Ask a Question

License

Contributing

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages