A complete implementation of a Retrieval-Augmented Generation (RAG) pipeline using Qdrant vector database, featuring document injection and semantic search capabilities.
- Document Processing: Supports multiple file formats (PDF, DOCX, CSV, JSON, etc.)
- Vector Embeddings: Uses state-of-the-art sentence transformers
- Hybrid Search: Combines semantic and keyword search
- Analytics Dashboard: Track search performance and patterns
- Security: Input validation and sanitization
- Health Monitoring: System resource tracking
- Data-Injection.py: Ingests and processes documents into Qdrant collections
- Data-Retrieval.py: Interactive search interface with analytics
- config.yaml: Centralized configuration for all components
# Clone the repository
git clone https://github.com/yourusername/qdrant-rag-pipeline.git
cd qdrant-rag-pipeline
# Install dependencies
pip install -r requirements.txt
# Download spaCy model
python -m spacy download en_core_web_smEdit config.yaml to customize:
qdrant:
host: "localhost" # Qdrant server address
port: 6333 # Qdrant port
model:
name: "all-MiniLM-L6-v2" # Embedding model
device: "cpu" # cpu/cuda/mps
processing:
chunk_size: 500 # Text chunk size
supported_extensions: # File types to process
- ".pdf"
- ".docx"
- ".txt"
Input Directory
input_dir: "file\\path\\docs" # Custom path to your documents folderpython Data-Injection.pypython Data-Retrieval.pygraph TD
A[Documents] --> B[Data Injection]
B --> C[Qdrant Vector DB]
C --> D[Data Retrieval]
D --> E[User Interface]