D(ocument)e(xtractor)adSimple

This FastAPI service extracts text content from a wide variety of document formats (PDF, DOCX, PPTX, EPUB, HTML, TXT, etc.) using markitdown. It returns the content as an array of strings, one for each logical page, slide, or section.

🚀 Features

Supports multiple document formats
Returns page-wise content as a JSON array
Automatically detects file type via content-type
FastAPI + Uvicorn app, easy to deploy

📦 Supported File Types

PDF (application/pdf)
DOCX / Word
PPTX / PowerPoint
EPUB
HTML
Markdown
TXT
CSV

Example usage

In OpenWebUI, configure for Document Extractor external and as url http://localhost:5000 (you can change this based on your needs)
Locally, curl -X POST http://localhost:5000/process -H "Content-Type: application/pdf" --data-binary @file.pdf

Env

PORT=5000
LLM_TOKEN # You can use a LLM to read images
LLM_MODEL # The LLM model
LLM_URL # The URL of an OpenAI compatible provider

🧪 Setup (with virtual environment)

python3 -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

./main.py

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
.env.example		.env.example
.gitignore		.gitignore
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

D(ocument)e(xtractor)adSimple

🚀 Features

📦 Supported File Types

Example usage

Env

🧪 Setup (with virtual environment)

About

Uh oh!

Releases

Packages

Languages

CodeAtCode/deadsimple

Folders and files

Latest commit

History

Repository files navigation

D(ocument)e(xtractor)adSimple

🚀 Features

📦 Supported File Types

Example usage

Env

🧪 Setup (with virtual environment)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages