A full-stack intelligent document management system that automatically:
- Extract text from uploaded documents (
.pdf/.txt) - Classify documents into predefined enterprise categories
- Move files to category-specific folders
- Display classification results and extracted text in a web dashboard
🚀 Built using Django + Gemini API + PyMuPDF. Designed for real-world AI/ML applications in Enterprise Information Management.
✅ Upload .pdf or .txt documents
✅ Automatically extract text using PyMuPDF
✅ Classify document using Gemini (Google AI) with zero training
✅ Dynamically store predicted label (e.g., Invoice, HR, Legal, Resume, etc.)
✅ Auto-organize files into folders based on their classification
✅ Minimal, functional web UI using Django templates
✅ Scalable structure: each phase builds toward a production-grade AI pipeline
Currently supported categories:
InvoiceLegalHRTechnicalResumeReportMarketingFinancial StatementEmailPolicy DocumentMeeting MinutesPresentationContractProduct ManualOther(fallback)
| Layer | Tech Used |
|---|---|
| Backend | Django 5.2, Python 3.10+ |
| AI/NLP | Google Gemini Pro (via google-genai SDK), PyMuPDF |
| Storage | Local filesystem (Django MEDIA_ROOT) |
| Frontend | Django Templates (HTML/CSS), Bootstrap (optional) |
| Deployment | Dev: runserver; Prod: WSGI/ASGI |
- User uploads a
.pdfor.txtfile through the web form - System extracts raw text using PyMuPDF
- The extracted text is passed to Gemini Pro (via API)
- Gemini returns a category label based on the content
- The file is moved to a subdirectory for that category (e.g.,
documents/HR/) - All document data is saved and displayed in the dashboard
git clone https://github.com/yourusername/smart-doc-ai.git
cd smart-doc-ai