π Privacy-first document processing. Entirely in your browser. Zero uploads.
Features β’ Quick Start β’ Architecture β’ Contributing β’ Roadmap β’ Demo
DocuHub is a comprehensive, browser-based document toolkit that performs all processing locally using WebAssembly. Built with TypeScript in a single, unified repository, it handles PDFs, document conversions, data transformations, and OCRβentirely offline.
Most document tools require uploading sensitive files to unknown servers. DocuHub eliminates this privacy risk by processing everything in your browser. No data ever leaves your device.
β
100% Privacy β All processing happens locally in your browser
β
Truly Offline β Works without internet after first load (PWA)
β
Fast β WebAssembly engines deliver near-native performance
β
Free & Open Source β No subscriptions, no hidden costs
β
Developer-Friendly β Clean TypeScript, modular architecture
β
Single Repository β Easy to contribute, easy to maintain
- DocuHub follows a maintainer-approved contribution process.
- Browse existing issues and pick one.
- If you have a new idea or bug report, open an issue first.
- Wait for maintainer approval before starting work.
- Once approved and labeled, you may begin development.
- Submit a pull request with a clear description.
Unapproved pull requests may be closed to maintain code quality.
Complete PDF Toolkit (Click to expand)
- Merge & Split β Combine multiple PDFs or extract specific pages
- Compress β Intelligent compression to reduce file size
- Reorder & Rotate β Drag-and-drop page organization
- Extract Pages β Pull out individual pages or ranges
- Annotate β Add highlights, drawings, text comments, and stamps
- Watermark β Apply text or image watermarks with custom positioning
- Headers & Footers β Add page numbers and custom text
- Security β Password protect or unlock encrypted PDFs
- Metadata Editor β Edit title, author, subject, keywords
- PDF Comparison β Visual diff between two PDF versions
- Form Operations β Fill PDF forms and flatten fields
- PDF/A Conversion β Convert to archival standard
Office Documents (Click to expand)
- Word (.docx) β PDF
- Excel (.xlsx) β PDF
- PowerPoint (.pptx) β PDF
- PDF β Word (experimental)
- PDF β Excel (table extraction)
- Text/Markdown β PDF
Images (Click to expand)
- Images β PDF (single or batch)
- PDF β Images (PNG, JPG, WebP)
- Format conversion (PNG β JPG β WebP β BMP)
- Batch resize and compression
- Smart quality optimization
Data Formats (Click to expand)
- JSON β Excel
- JSON β CSV
- CSV β Excel
- XML β JSON
- YAML β JSON
- JSON β SQL INSERT statements
- Excel β JSON schema generation
- JSON Tree Viewer/Editor β Interactive JSON exploration
- JSON Flattener β Convert nested JSON to flat structures
- Schema Generation β Auto-generate JSON schemas from data
- Schema Validation β Validate JSON against schemas
- Data Cleanup β Remove duplicates, infer types, normalize
- Merge/Split β Combine or divide large datasets
- Streaming Support β Handle files too large for memory
- Image β Text β Extract text from images (Tesseract.js)
- Scanned PDF β Searchable PDF β Add text layer to scans
- Table Extraction β Pull structured data from documents
- Language Detection β Identify document language
- Keyword Extraction β Automatic keyword tagging
- Batch Processing β Process multiple files at once
- Visual Pipeline Builder β Drag-and-drop workflow creation
- Preset Workflows β Pre-configured chains (e.g., OCR β Compress β Watermark)
- Local History β Undo operations with IndexedDB persistence
- Template System β Save and reuse processing configurations
-
All processing inside Web Workers
-
Memory cleared after processing
-
IndexedDB storage fully user-controlled
-
Zero Server Uploads β Nothing leaves your browser
-
No Tracking β No analytics, no cookies, no surveillance
-
Local Processing β All computation happens on your device
-
Secure Memory Cleanup β Sensitive data cleared after processing
-
Manual Cache Control β You control what's stored locally
Get DocuHub up and running locally in just a few minutes.
- pnpm not found? β
npm install -g pnpm - Port 3000 in use? β
lsof -i :3000then kill the process - WASM errors? β Use latest Chrome/Edge/Firefox
- Slow load? β Clear browser cache or check internet for initial WASM download
Make sure you have the following installed on your system:
- Node.js
v18+ - Package manager:
npm,yarn, orpnpm(recommended) - A modern web browser with WebAssembly (WASM) support
π‘ Tip: Check your Node version using
node -v
Follow these steps to run DocuHub locally:
# Clone the repository
git clone https://github.com/R3ACTR/DocuHub.git
# Navigate to the project directory
cd DocuHub
# Install dependencies
pnpm install
# Start the development server
pnpm run dev
Once the server starts, open your browser and visit:
http://localhost:3000
You should now see DocuHub running locally π
If you prefer npm or yarn, you can use:
# npm
npm install
npm run dev
# yarn
yarn install
yarn dev
# Create optimized production build
pnpm run build
# Preview production build
pnpm run preview
app/ β App Router pages components/ β Reusable UI components lib/ β Core processing logic public/pdfjs/ β PDF.js workers tools.config.ts β Tool metadata registry
DocuHub follows a modular client-side architecture:
- Next.js App Router
- TailwindCSS
- TypeScript (strict mode)
- WebAssembly engines
- Web Workers for heavy computation
- IndexedDB for persistence
- PDF.js for PDF preview
- Canvas-based rendering
- Lazy-loaded workers
We welcome contributions of all kinds! Whether you're fixing bugs, adding features, improving docs, or suggesting ideasβyou're valuable to this project.
π§βπ» Contribution Workflow DocuHub follows a maintainer-approved contribution process:
π Browse existing issues and pick one π‘ If you have a new idea or bug report, open an issue first β³ Wait for maintainer approval before starting work β Once approved and labeled, you may begin development π Submit a pull request with a clear description
Look for issues tagged with:
good first issueβ Easy tasks for newcomershelp wantedβ We need your expertise!documentationβ Improve our docs
- Code Style β We use ESLint + Prettier (auto-format on save)
- Commits β Use Conventional Commits
- Tests β Add tests for new features
- Docs β Update docs for API changes
See CONTRIBUTING.md for detailed guidelines.
- Project setup and architecture
- Basic PDF merge/split
- Simple file conversions
- PWA scaffolding
- Complete PDF toolkit (annotate, watermark, forms)
- OCR integration (Tesseract.js)
- Data format conversions (JSON/CSV/Excel)
- Batch processing
- Visual pipeline builder
- PDF comparison/diff
- Advanced OCR (table extraction)
- Template system
- Performance optimizations
- Mobile-first UI improvements
- i18n (internationalization)
- Plugin system (experimental)
- Register tool in
tools.config.ts - Create tool page in
app/tool/[id]/ - Implement processing logic in
lib/ - Add UI using existing components
- Test locally
Here are some screenshots showing DocuHub in action β all processing happens locally in your browser.
- Select tool
- Upload file
- Processing (WebAssembly worker)
- Download result
- Drag and drop files
- Reorder pages
- Preview thumbnails
- Export merged document
Privacy-first document processing - all tools in one place
Choose from PDF operations, conversions, OCR, and data tools
Simple drag-and-drop interface - all processing happens locally
Visual feedback during processing operations
View and download your processed documents
Upload multiple PDFs to merge into a single document
Review selected files before merging
Drag and drop to reorder PDFs - complete control over final document
Real-time processing with WebAssembly - fast and efficient
Download your merged PDF - all processing done in your browser
- Service Worker caches the entire app on first load
- WebAssembly modules loaded into memory
- File operations use FileReader API (no uploads)
- Processing happens in Web Workers (non-blocking)
- Results saved to IndexedDB or downloaded directly
| Operation | File Size | Time |
|---|---|---|
| PDF Merge (10 files) | 50 MB | ~2.3s |
| Image β PDF (batch 20) | 30 MB | ~1.8s |
| OCR (300 DPI scan) | 5 MB | ~4.5s |
| JSON β Excel (100k rows) | 10 MB | ~0.9s |
Tested on: M1 Mac, Chrome 120
- No Data Collection β We don't see or store your files
- No Third-Party Services β No external APIs called
- No Tracking β No analytics, no cookies
- Open Source β Audit our code anytime
- Local Storage Only β You control what's cached
- Files processed in isolated Web Workers
- Memory cleared after operations
- No persistent storage without user consent
- Service Worker can be manually cleared
- User Guide β How to use DocuHub
- API Reference β Core API documentation
- Architecture Guide β Technical deep dive
- Contributing Guide β How to contribute
- FAQ β Common questions
DocuHub is built on the shoulders of giants:
- PDF.js β PDF rendering
- pdf-lib β PDF manipulation
- Tesseract.js β OCR engine
- PapaParse β CSV parsing
- ExcelJS β Excel operations
- Sharp β Image processing
This project is licensed under the MIT License β see LICENSE for details.
If DocuHub helps you, consider:
- β Star this repo β Show your support
- π Report bugs β Help us improve
- π‘ Suggest features β Share your ideas
- π€ Contribute β Submit a PR
- π’ Spread the word β Tell others about DocuHub
Every bit helps β thank you for using DocuHub! β€οΈ