Skip to content

cfleming22/learnlocalllm

Repository files navigation

Retrieval-Augmented Generation (RAG) System

A Python-based RAG system that processes PDF and DOCX files, creates embeddings for document chunks, uses FAISS for efficient similarity search, and generates responses based on retrieved contexts.

Features

  • Document Processing

    • Support for PDF and DOCX files
    • Configurable text chunking with overlap
    • Automatic handling of document boundaries
  • Embedding Creation

    • Uses SentenceTransformer models
    • Efficient caching system for embeddings
    • Support for different embedding models
  • Similarity Search

    • FAISS-based vector search
    • Multiple index types (Flat, IVF, HNSW)
    • Optimizable search parameters
  • Response Generation

    • Context-based response generation
    • Multiple context combination strategies
    • Configurable context length
  • Optimization Features

    • Hyperparameter tuning
    • Index parameter optimization
    • Performance metrics tracking
    • Caching system
    • Batch processing support

Installation

  1. Clone the repository:
git clone <repository-url>
cd rag-system
  1. Install dependencies:
pip install -r requirements.txt

Quick Start

from rag_system import RAGSystem

# Initialize the system
rag = RAGSystem(
    embedding_model='all-mpnet-base-v2',
    chunk_size=256,
    chunk_overlap=64,
    cache_dir='cache'
)

# Add documents
rag.add_documents('path/to/documents')

# Query the system
response = rag.query("What is the main topic of the documents?")
print(response)

Usage Guide

Document Processing

The system can process both PDF and DOCX files:

# Process all documents in a directory
rag.add_documents('path/to/documents')

# Process a single document
rag.add_document('path/to/document.pdf')

Querying

# Simple query
response = rag.query("What are the key findings?")

# Query with metadata
response = rag.query(
    "What methodology was used?",
    k=5,  # Number of contexts to retrieve
    include_metadata=True
)

Index Types

The system supports different FAISS index types:

# Flat index (exact search)
rag = RAGSystem(index_type='flat')

# IVF index (approximate search with clustering)
rag = RAGSystem(index_type='ivf')

# HNSW index (graph-based approximate search)
rag = RAGSystem(index_type='hnsw')

Context Strategies

Choose how to combine multiple relevant contexts:

# Concatenate all relevant contexts
rag = RAGSystem(context_strategy='concatenate')

# Use only the best matching context
rag = RAGSystem(context_strategy='best_match')

Optimization

# Optimize retrieval parameters
best_params = rag.optimize_retrieval([
    "What are the key findings?",
    "What methodology was used?",
    "What are the main conclusions?"
])

Saving and Loading

# Save the index
rag.save_index('path/to/index')

# Load the index
rag.load_index('path/to/index')

System Components

  • DocumentProcessor: Handles document loading and text chunking
  • EmbeddingEngine: Creates and manages embeddings
  • RetrievalEngine: Handles FAISS index and similarity search
  • ResponseGenerator: Generates responses from retrieved contexts
  • RAGSystem: Main class that orchestrates all components

Performance Optimization

  1. Caching

    • Enable caching by providing a cache directory
    • Embeddings and indices are cached for reuse
    • Reduces computation time for repeated operations
  2. Chunking Strategy

    • Adjust chunk size based on your documents
    • Use appropriate overlap for context continuity
    • Consider semantic chunking for better results
  3. Index Selection

    • Flat index: Best for small datasets (<100K documents)
    • IVF index: Good for medium datasets
    • HNSW index: Best for large datasets
  4. Parameter Tuning

    • Use optimize_retrieval() for index parameters
    • Adjust batch size for large document collections
    • Configure context length based on your needs

Jupyter Notebook

A comprehensive Jupyter notebook (rag_system.ipynb) is included that demonstrates:

  • System setup and configuration
  • Document processing
  • Querying and response generation
  • System optimization
  • Advanced usage examples

Web Interface

The system includes a Gradio-based web interface for easy interaction. To launch the interface:

python gradio_interface.py

The interface provides three main tabs:

  1. Upload Documents

    • Upload individual PDF/DOCX files
    • Process entire directories of documents
    • View processing status
  2. Configuration

    • Adjust chunk size and overlap
    • Select index type (Flat, IVF, HNSW)
    • Choose context strategy
    • Set maximum context length
  3. Query

    • Enter questions
    • Adjust number of contexts to retrieve
    • View answers with source documents
    • Toggle metadata display

The interface automatically manages document processing, embedding creation, and query handling through an intuitive UI.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Project III - Final Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published