Skip to content
This repository was archived by the owner on Feb 18, 2026. It is now read-only.

R3ACTR/DocuHub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

531 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

banner

πŸ”’ Privacy-first document processing. Entirely in your browser. Zero uploads.

TypeScript WebAssembly License: MIT PRs Welcome Offline First GitHub stars GitHub forks Node Version Progressive Web App

Features β€’ Quick Start β€’ Architecture β€’ Contributing β€’ Roadmap β€’ Demo


🎯 What is DocuHub?

DocuHub is a comprehensive, browser-based document toolkit that performs all processing locally using WebAssembly. Built with TypeScript in a single, unified repository, it handles PDFs, document conversions, data transformations, and OCRβ€”entirely offline.

πŸ›‘ The Problem We Solve

Most document tools require uploading sensitive files to unknown servers. DocuHub eliminates this privacy risk by processing everything in your browser. No data ever leaves your device.

Why DocuHub?

βœ… 100% Privacy – All processing happens locally in your browser
βœ… Truly Offline – Works without internet after first load (PWA)
βœ… Fast – WebAssembly engines deliver near-native performance
βœ… Free & Open Source – No subscriptions, no hidden costs
βœ… Developer-Friendly – Clean TypeScript, modular architecture
βœ… Single Repository – Easy to contribute, easy to maintain


πŸ§‘β€πŸ’» Contribution Workflow

  • DocuHub follows a maintainer-approved contribution process.
  • Browse existing issues and pick one.
  • If you have a new idea or bug report, open an issue first.
  • Wait for maintainer approval before starting work.
  • Once approved and labeled, you may begin development.
  • Submit a pull request with a clear description.

Unapproved pull requests may be closed to maintain code quality.

✨ Features

πŸ“„ PDF Operations

Complete PDF Toolkit (Click to expand)
  • Merge & Split – Combine multiple PDFs or extract specific pages
  • Compress – Intelligent compression to reduce file size
  • Reorder & Rotate – Drag-and-drop page organization
  • Extract Pages – Pull out individual pages or ranges
  • Annotate – Add highlights, drawings, text comments, and stamps
  • Watermark – Apply text or image watermarks with custom positioning
  • Headers & Footers – Add page numbers and custom text
  • Security – Password protect or unlock encrypted PDFs
  • Metadata Editor – Edit title, author, subject, keywords
  • PDF Comparison – Visual diff between two PDF versions
  • Form Operations – Fill PDF forms and flatten fields
  • PDF/A Conversion – Convert to archival standard

πŸ” Document Conversions

Office Documents (Click to expand)
  • Word (.docx) β†’ PDF
  • Excel (.xlsx) β†’ PDF
  • PowerPoint (.pptx) β†’ PDF
  • PDF β†’ Word (experimental)
  • PDF β†’ Excel (table extraction)
  • Text/Markdown β†’ PDF
Images (Click to expand)
  • Images β†’ PDF (single or batch)
  • PDF β†’ Images (PNG, JPG, WebP)
  • Format conversion (PNG ↔ JPG ↔ WebP ↔ BMP)
  • Batch resize and compression
  • Smart quality optimization
Data Formats (Click to expand)
  • JSON ↔ Excel
  • JSON ↔ CSV
  • CSV ↔ Excel
  • XML β†’ JSON
  • YAML β†’ JSON
  • JSON β†’ SQL INSERT statements
  • Excel β†’ JSON schema generation

πŸ“Š Structured Data Tools

  • JSON Tree Viewer/Editor – Interactive JSON exploration
  • JSON Flattener – Convert nested JSON to flat structures
  • Schema Generation – Auto-generate JSON schemas from data
  • Schema Validation – Validate JSON against schemas
  • Data Cleanup – Remove duplicates, infer types, normalize
  • Merge/Split – Combine or divide large datasets
  • Streaming Support – Handle files too large for memory

🧠 Offline OCR & Intelligence

  • Image β†’ Text – Extract text from images (Tesseract.js)
  • Scanned PDF β†’ Searchable PDF – Add text layer to scans
  • Table Extraction – Pull structured data from documents
  • Language Detection – Identify document language
  • Keyword Extraction – Automatic keyword tagging

βš™οΈ Automation & Workflows

  • Batch Processing – Process multiple files at once
  • Visual Pipeline Builder – Drag-and-drop workflow creation
  • Preset Workflows – Pre-configured chains (e.g., OCR β†’ Compress β†’ Watermark)
  • Local History – Undo operations with IndexedDB persistence
  • Template System – Save and reuse processing configurations

πŸ” Privacy & Security

Privacy Model

  • All processing inside Web Workers

  • Memory cleared after processing

  • IndexedDB storage fully user-controlled

  • Zero Server Uploads – Nothing leaves your browser

  • No Tracking – No analytics, no cookies, no surveillance

  • Local Processing – All computation happens on your device

  • Secure Memory Cleanup – Sensitive data cleared after processing

  • Manual Cache Control – You control what's stored locally



πŸš€ Quick Start

Get DocuHub up and running locally in just a few minutes.

Troubleshooting

  • pnpm not found? β†’ npm install -g pnpm
  • Port 3000 in use? β†’ lsof -i :3000 then kill the process
  • WASM errors? β†’ Use latest Chrome/Edge/Firefox
  • Slow load? β†’ Clear browser cache or check internet for initial WASM download

πŸ“¦ Prerequisites

Make sure you have the following installed on your system:

  • Node.js v18+
  • Package manager: npm, yarn, or pnpm (recommended)
  • A modern web browser with WebAssembly (WASM) support

πŸ’‘ Tip: Check your Node version using node -v


βš™οΈ Installation & Setup

Follow these steps to run DocuHub locally:

# Clone the repository
git clone https://github.com/R3ACTR/DocuHub.git

# Navigate to the project directory
cd DocuHub

# Install dependencies
pnpm install

# Start the development server
pnpm run dev

🌐 Run the App

Once the server starts, open your browser and visit:

http://localhost:3000

You should now see DocuHub running locally πŸŽ‰


πŸ› οΈ Alternative Package Managers

If you prefer npm or yarn, you can use:

# npm
npm install
npm run dev

# yarn
yarn install
yarn dev

Build for Production

# Create optimized production build
pnpm run build

# Preview production build
pnpm run preview

πŸ“‚ Project Structure

app/ β†’ App Router pages components/ β†’ Reusable UI components lib/ β†’ Core processing logic public/pdfjs/ β†’ PDF.js workers tools.config.ts β†’ Tool metadata registry

πŸ— Architecture Overview

DocuHub follows a modular client-side architecture:

Frontend Layer

  • Next.js App Router
  • TailwindCSS
  • TypeScript (strict mode)

Processing Layer

  • WebAssembly engines
  • Web Workers for heavy computation
  • IndexedDB for persistence

Rendering Layer

  • PDF.js for PDF preview
  • Canvas-based rendering
  • Lazy-loaded workers

🀝 Contributing

We welcome contributions of all kinds! Whether you're fixing bugs, adding features, improving docs, or suggesting ideasβ€”you're valuable to this project.

πŸ§‘β€πŸ’» Contribution Workflow DocuHub follows a maintainer-approved contribution process:

πŸ“‹ Browse existing issues and pick one πŸ’‘ If you have a new idea or bug report, open an issue first ⏳ Wait for maintainer approval before starting work βœ… Once approved and labeled, you may begin development πŸš€ Submit a pull request with a clear description

⚠️ Note: Unapproved pull requests may be closed to maintain code quality.

First-Time Contributors

Look for issues tagged with:

  • good first issue – Easy tasks for newcomers
  • help wanted – We need your expertise!
  • documentation – Improve our docs

Development Guidelines

  • Code Style – We use ESLint + Prettier (auto-format on save)
  • Commits – Use Conventional Commits
  • Tests – Add tests for new features
  • Docs – Update docs for API changes

See CONTRIBUTING.md for detailed guidelines.


πŸ—ΊοΈ Roadmap

Phase 1: Foundation

  • Project setup and architecture
  • Basic PDF merge/split
  • Simple file conversions
  • PWA scaffolding

Phase 2: Core Features

  • Complete PDF toolkit (annotate, watermark, forms)
  • OCR integration (Tesseract.js)
  • Data format conversions (JSON/CSV/Excel)
  • Batch processing

Phase 3: Advanced Tools

  • Visual pipeline builder
  • PDF comparison/diff
  • Advanced OCR (table extraction)
  • Template system

Phase 4: Polish & Scale

  • Performance optimizations
  • Mobile-first UI improvements
  • i18n (internationalization)
  • Plugin system (experimental)

πŸ›  Adding a New Tool

  1. Register tool in tools.config.ts
  2. Create tool page in app/tool/[id]/
  3. Implement processing logic in lib/
  4. Add UI using existing components
  5. Test locally

πŸŽ₯ Demo

Here are some screenshots showing DocuHub in action β€” all processing happens locally in your browser.

πŸ”Ή Tool Workflow

  1. Select tool
  2. Upload file
  3. Processing (WebAssembly worker)
  4. Download result

πŸ”Ή Example: Merge PDF Flow

  • Drag and drop files
  • Reorder pages
  • Preview thumbnails
  • Export merged document

Landing Page

Privacy-first document processing - all tools in one place

Landing Page

Tool Selection

Choose from PDF operations, conversions, OCR, and data tools

Tool Selection

File Upload

Simple drag-and-drop interface - all processing happens locally

File Upload

Processing

Visual feedback during processing operations

Processing

Result

View and download your processed documents

Result

Merge PDF Flow

Upload multiple PDFs to merge into a single document

Merge PDF – Upload

Review selected files before merging

Merge PDFs – Files Selected

Drag and drop to reorder PDFs - complete control over final document

Merge PDFs – Reordering

Real-time processing with WebAssembly - fast and efficient

Merge PDFs – Processing

Download your merged PDF - all processing done in your browser

Merge PDFs – Success

πŸ› οΈ Tech Deep Dive

How Offline Processing Works

  1. Service Worker caches the entire app on first load
  2. WebAssembly modules loaded into memory
  3. File operations use FileReader API (no uploads)
  4. Processing happens in Web Workers (non-blocking)
  5. Results saved to IndexedDB or downloaded directly

Performance Benchmarks

Operation File Size Time
PDF Merge (10 files) 50 MB ~2.3s
Image β†’ PDF (batch 20) 30 MB ~1.8s
OCR (300 DPI scan) 5 MB ~4.5s
JSON β†’ Excel (100k rows) 10 MB ~0.9s

Tested on: M1 Mac, Chrome 120


πŸ”’ Security & Privacy

Our Promises

  1. No Data Collection – We don't see or store your files
  2. No Third-Party Services – No external APIs called
  3. No Tracking – No analytics, no cookies
  4. Open Source – Audit our code anytime
  5. Local Storage Only – You control what's cached

Security Best Practices

  • Files processed in isolated Web Workers
  • Memory cleared after operations
  • No persistent storage without user consent
  • Service Worker can be manually cleared

πŸ“š Documentation


πŸ™ Acknowledgments

DocuHub is built on the shoulders of giants:


πŸ“„ License

This project is licensed under the MIT License – see LICENSE for details.


🌟 Support the Project

If DocuHub helps you, consider:

  • ⭐ Star this repo – Show your support
  • πŸ› Report bugs – Help us improve
  • πŸ’‘ Suggest features – Share your ideas
  • 🀝 Contribute – Submit a PR
  • πŸ“’ Spread the word – Tell others about DocuHub

Every bit helps β€” thank you for using DocuHub! ❀️

⬆ Back to Top

Releases

No releases published

Packages

No packages published

Contributors 18