Repository: https://github.com/NithinRegidi/DocVox.git
DocVox is an accessibility-focused document analysis application that helps users understand their documents through AI-powered analysis and voice interactions. Upload documents, get instant summaries, and interact using voice commands in multiple Indian languages.
- Multi-format Support: PDF, Images (PNG, JPG, WEBP), Word Documents (.docx), Text files
- OCR Technology: Extract text from scanned documents and images
- AI Analysis: Automatic document type detection, summaries, key information extraction
- Smart Deadlines: AI-detected deadlines and important dates
- Supported Languages: English, Telugu, Hindi, Tamil, Kannada, Malayalam, Bengali
- Voice Navigation: Control the app hands-free with natural language
- Multi-dialect Support: Understands regional variations (Hyderabadi Telugu, Mumbai Hindi, etc.)
- Native Indian Voices: Sarvam AI integration for authentic Indian language TTS
- Multiple Providers: Fallback chain - Sarvam AI → Murf AI → Google Cloud → ElevenLabs → Browser TTS
- Language Auto-detection: Automatically speaks in the document's language
- 20+ Languages: Translate documents to Hindi, Telugu, Tamil, Bengali, and more
- Free Fallback: Works even without API quota using MyMemory translation
- Document Sharing: Generate shareable links for documents
- PDF Export: Download analysis reports as PDFs
- Reminders: Set deadline reminders with notifications
- Tags & Folders: Organize documents efficiently
- Smart Search: Search across all your documents
- Dark/Light Theme: Comfortable viewing in any environment
| Category | Technology |
|---|---|
| Frontend | React 18, TypeScript, Vite |
| Styling | Tailwind CSS, shadcn/ui |
| Backend | Supabase (Auth, Database, Storage) |
| AI | Google Gemini API |
| TTS | Sarvam AI, Murf AI, Google Cloud TTS, ElevenLabs |
| Speech | Web Speech API |
- Node.js v18 or higher
- npm or bun
- Supabase account (free tier works)
IMPORTANT: Never commit .env files with real API keys to GitHub!
-
Copy the example file:
cp .env.example .env
-
Add
.envto.gitignore(already done ✅) -
Verify your
.gitignoreincludes.env:git rm --cached .env git commit -m "Remove .env from tracking" -
If you accidentally committed keys:
# Remove file from git history git filter-branch --tree-filter 'rm -f .env' HEAD git push origin --force-with-lease
# Clone the repository
git clone https://github.com/NithinRegidi/DocVox.git
# Navigate to project directory
cd DocVox
# Install dependencies
npm install
# Start development server
npm run dev-
Create
.envfile (copy from.env.example):cp .env.example .env
-
Fill in your API keys:
| Variable | Where to Get | Required? |
|---|---|---|
VITE_SUPABASE_URL |
Supabase Dashboard | ✅ Yes |
VITE_SUPABASE_PUBLISHABLE_KEY |
Supabase Settings | ✅ Yes |
VITE_GOOGLE_API |
Google AI Studio | ✅ Yes |
VITE_SARVAM_API_KEY |
Sarvam AI | ❌ Optional |
VITE_ELEVENLABS_API_KEY |
ElevenLabs | ❌ Optional |
VITE_MURF_API_KEY |
Murf AI | ❌ Optional |
VITE_OPENAI_API_KEY |
OpenAI | ❌ Optional |
VITE_GITHUB_TOKEN |
GitHub Settings | ❌ Optional |
- Do NOT commit
.envfile:.envis in.gitignore✅- Only commit
.env.example - Each developer creates their own
.envlocally
| Command | Action |
|---|---|
| "Read summary" / "సారాంశం చదవు" | Read document summary |
| "What are the deadlines" / "గడువులు ఏమిటి" | List important dates |
| "Key information" / "ముఖ్యమైన సమాచారం" | Read key points |
| "Translate to Hindi" / "హిందీలో అనువదించు" | Translate document |
| "Stop" / "ఆపు" | Stop speaking |
| "Help" / "సహాయం" | Show available commands |
DocVox/
├── src/
│ ├── components/ # React components
│ ├── hooks/ # Custom hooks (voice commands, TTS)
│ ├── lib/ # Utilities (TTS, translation, AI)
│ ├── pages/ # Page components
│ └── integrations/ # Supabase integration
├── supabase/
│ ├── functions/ # Edge functions
│ └── migrations/ # Database migrations
└── public/ # Static assets
- Setup Guide - Complete installation instructions
- Testing Guide - Feature testing checklist
- Dialect Support Plan - Regional dialect implementation
Contributions are welcome! Please feel free to submit a Pull Request.
This project is open source and available under the MIT License.
Made with ❤️ for accessibility