GitHub

🔒 Privacy-first document processing. Entirely in your browser. Zero uploads.

Features • Quick Start • Architecture • Contributing • Roadmap • Demo

🎯 What is DocuHub?

DocuHub is a comprehensive, browser-based document toolkit that performs all processing locally using WebAssembly. Built with TypeScript in a single, unified repository, it handles PDFs, document conversions, data transformations, and OCR—entirely offline.

🛑 The Problem We Solve

Most document tools require uploading sensitive files to unknown servers. DocuHub eliminates this privacy risk by processing everything in your browser. No data ever leaves your device.

Why DocuHub?

✅ 100% Privacy – All processing happens locally in your browser
✅ Truly Offline – Works without internet after first load (PWA)
✅ Fast – WebAssembly engines deliver near-native performance
✅ Free & Open Source – No subscriptions, no hidden costs
✅ Developer-Friendly – Clean TypeScript, modular architecture
✅ Single Repository – Easy to contribute, easy to maintain

🧑‍💻 Contribution Workflow

DocuHub follows a maintainer-approved contribution process.
Browse existing issues and pick one.
If you have a new idea or bug report, open an issue first.
Wait for maintainer approval before starting work.
Once approved and labeled, you may begin development.
Submit a pull request with a clear description.

Unapproved pull requests may be closed to maintain code quality.

✨ Features

📄 PDF Operations

Complete PDF Toolkit (Click to expand)

Merge & Split – Combine multiple PDFs or extract specific pages
Compress – Intelligent compression to reduce file size
Reorder & Rotate – Drag-and-drop page organization
Extract Pages – Pull out individual pages or ranges
Annotate – Add highlights, drawings, text comments, and stamps
Watermark – Apply text or image watermarks with custom positioning
Headers & Footers – Add page numbers and custom text
Security – Password protect or unlock encrypted PDFs
Metadata Editor – Edit title, author, subject, keywords
PDF Comparison – Visual diff between two PDF versions
Form Operations – Fill PDF forms and flatten fields
PDF/A Conversion – Convert to archival standard

🔁 Document Conversions

Office Documents (Click to expand)

Word (.docx) → PDF
Excel (.xlsx) → PDF
PowerPoint (.pptx) → PDF
PDF → Word (experimental)
PDF → Excel (table extraction)
Text/Markdown → PDF

Images (Click to expand)

Images → PDF (single or batch)
PDF → Images (PNG, JPG, WebP)
Format conversion (PNG ↔ JPG ↔ WebP ↔ BMP)
Batch resize and compression
Smart quality optimization

Data Formats (Click to expand)

JSON ↔ Excel
JSON ↔ CSV
CSV ↔ Excel
XML → JSON
YAML → JSON
JSON → SQL INSERT statements
Excel → JSON schema generation

📊 Structured Data Tools

JSON Tree Viewer/Editor – Interactive JSON exploration
JSON Flattener – Convert nested JSON to flat structures
Schema Generation – Auto-generate JSON schemas from data
Schema Validation – Validate JSON against schemas
Data Cleanup – Remove duplicates, infer types, normalize
Merge/Split – Combine or divide large datasets
Streaming Support – Handle files too large for memory

🧠 Offline OCR & Intelligence

Image → Text – Extract text from images (Tesseract.js)
Scanned PDF → Searchable PDF – Add text layer to scans
Table Extraction – Pull structured data from documents
Language Detection – Identify document language
Keyword Extraction – Automatic keyword tagging

⚙️ Automation & Workflows

Batch Processing – Process multiple files at once
Visual Pipeline Builder – Drag-and-drop workflow creation
Preset Workflows – Pre-configured chains (e.g., OCR → Compress → Watermark)
Local History – Undo operations with IndexedDB persistence
Template System – Save and reuse processing configurations

🔐 Privacy & Security

Privacy Model

All processing inside Web Workers
Memory cleared after processing
IndexedDB storage fully user-controlled
Zero Server Uploads – Nothing leaves your browser
No Tracking – No analytics, no cookies, no surveillance
Local Processing – All computation happens on your device
Secure Memory Cleanup – Sensitive data cleared after processing
Manual Cache Control – You control what's stored locally

🚀 Quick Start

Get DocuHub up and running locally in just a few minutes.

Troubleshooting

pnpm not found? → npm install -g pnpm
Port 3000 in use? → lsof -i :3000 then kill the process
WASM errors? → Use latest Chrome/Edge/Firefox
Slow load? → Clear browser cache or check internet for initial WASM download

📦 Prerequisites

Make sure you have the following installed on your system:

Node.js v18+
Package manager: npm, yarn, or pnpm (recommended)
A modern web browser with WebAssembly (WASM) support

💡 Tip: Check your Node version using node -v

⚙️ Installation & Setup

Follow these steps to run DocuHub locally:

# Clone the repository
git clone https://github.com/R3ACTR/DocuHub.git

# Navigate to the project directory
cd DocuHub

# Install dependencies
pnpm install

# Start the development server
pnpm run dev

🌐 Run the App

Once the server starts, open your browser and visit:

http://localhost:3000

You should now see DocuHub running locally 🎉

🛠️ Alternative Package Managers

If you prefer npm or yarn, you can use:

# npm
npm install
npm run dev

# yarn
yarn install
yarn dev

Build for Production

# Create optimized production build
pnpm run build

# Preview production build
pnpm run preview

📂 Project Structure

app/ → App Router pages components/ → Reusable UI components lib/ → Core processing logic public/pdfjs/ → PDF.js workers tools.config.ts → Tool metadata registry

🏗 Architecture Overview

DocuHub follows a modular client-side architecture:

Frontend Layer

Next.js App Router
TailwindCSS
TypeScript (strict mode)

Processing Layer

WebAssembly engines
Web Workers for heavy computation
IndexedDB for persistence

Rendering Layer

PDF.js for PDF preview
Canvas-based rendering
Lazy-loaded workers

🤝 Contributing

We welcome contributions of all kinds! Whether you're fixing bugs, adding features, improving docs, or suggesting ideas—you're valuable to this project.

🧑‍💻 Contribution Workflow DocuHub follows a maintainer-approved contribution process:

📋 Browse existing issues and pick one 💡 If you have a new idea or bug report, open an issue first ⏳ Wait for maintainer approval before starting work ✅ Once approved and labeled, you may begin development 🚀 Submit a pull request with a clear description

⚠️ Note: Unapproved pull requests may be closed to maintain code quality.

First-Time Contributors

Look for issues tagged with:

good first issue – Easy tasks for newcomers
help wanted – We need your expertise!
documentation – Improve our docs

Development Guidelines

Code Style – We use ESLint + Prettier (auto-format on save)
Commits – Use Conventional Commits
Tests – Add tests for new features
Docs – Update docs for API changes

See CONTRIBUTING.md for detailed guidelines.

🗺️ Roadmap

Phase 1: Foundation

Project setup and architecture
Basic PDF merge/split
Simple file conversions
PWA scaffolding

Phase 2: Core Features

Complete PDF toolkit (annotate, watermark, forms)
OCR integration (Tesseract.js)
Data format conversions (JSON/CSV/Excel)
Batch processing

Phase 3: Advanced Tools

Visual pipeline builder
PDF comparison/diff
Advanced OCR (table extraction)
Template system

Phase 4: Polish & Scale

Performance optimizations
Mobile-first UI improvements
i18n (internationalization)
Plugin system (experimental)

🛠 Adding a New Tool

Register tool in tools.config.ts
Create tool page in app/tool/[id]/
Implement processing logic in lib/
Add UI using existing components
Test locally

🎥 Demo

Here are some screenshots showing DocuHub in action — all processing happens locally in your browser.

🔹 Tool Workflow

Select tool
Upload file
Processing (WebAssembly worker)
Download result

🔹 Example: Merge PDF Flow

Drag and drop files
Reorder pages
Preview thumbnails
Export merged document

Landing Page

Privacy-first document processing - all tools in one place

Tool Selection

Choose from PDF operations, conversions, OCR, and data tools

File Upload

Simple drag-and-drop interface - all processing happens locally

Processing

Visual feedback during processing operations

Result

View and download your processed documents

Merge PDF Flow

Upload multiple PDFs to merge into a single document

Review selected files before merging

Drag and drop to reorder PDFs - complete control over final document

Real-time processing with WebAssembly - fast and efficient

Download your merged PDF - all processing done in your browser

🛠️ Tech Deep Dive

How Offline Processing Works

Service Worker caches the entire app on first load
WebAssembly modules loaded into memory
File operations use FileReader API (no uploads)
Processing happens in Web Workers (non-blocking)
Results saved to IndexedDB or downloaded directly

Performance Benchmarks

Operation	File Size	Time
PDF Merge (10 files)	50 MB	~2.3s
Image → PDF (batch 20)	30 MB	~1.8s
OCR (300 DPI scan)	5 MB	~4.5s
JSON → Excel (100k rows)	10 MB	~0.9s

Tested on: M1 Mac, Chrome 120

🔒 Security & Privacy

Our Promises

No Data Collection – We don't see or store your files
No Third-Party Services – No external APIs called
No Tracking – No analytics, no cookies
Open Source – Audit our code anytime
Local Storage Only – You control what's cached

Security Best Practices

Files processed in isolated Web Workers
Memory cleared after operations
No persistent storage without user consent
Service Worker can be manually cleared

📚 Documentation

User Guide – How to use DocuHub
API Reference – Core API documentation
Architecture Guide – Technical deep dive
Contributing Guide – How to contribute
FAQ – Common questions

🙏 Acknowledgments

DocuHub is built on the shoulders of giants:

PDF.js – PDF rendering
pdf-lib – PDF manipulation
Tesseract.js – OCR engine
PapaParse – CSV parsing
ExcelJS – Excel operations
Sharp – Image processing

📄 License

This project is licensed under the MIT License – see LICENSE for details.

🌟 Support the Project

If DocuHub helps you, consider:

⭐ Star this repo – Show your support
🐛 Report bugs – Help us improve
💡 Suggest features – Share your ideas
🤝 Contribute – Submit a PR
📢 Spread the word – Tell others about DocuHub

Every bit helps — thank you for using DocuHub! ❤️

⬆ Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 531 Commits
DocuHub		DocuHub
UI Design		UI Design
app		app
components		components
design		design
docs/merge-pdf		docs/merge-pdf
lib		lib
public		public
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Header.tsx		Header.tsx
LICENSE		LICENSE
README.md		README.md
build.log		build.log
build_error.log		build_error.log
build_error_2.log		build_error_2.log
build_result.log		build_result.log
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package.json		package.json
page.tsx		page.tsx
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tatus		tatus
tsconfig.json		tsconfig.json

License

R3ACTR/DocuHub

Folders and files

Latest commit

History

Repository files navigation