#

ocr

Here are 2,393 public repositories matching this topic...

PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

ocr db crnn ocrlite chineseocr

Updated Apr 17, 2025
Python

Umi-OCR

hiroi-sora / Umi-OCR

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片，PDF文档识别，排除水印/页眉页脚，扫描/生成二维码。内置多国语言库。

screenshot qt ocr qml ocr-python paddleocr umi-ocr

Updated Mar 26, 2025
Python

opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。

python pdf parser ocr pdf-converter extract-data document-analysis pdf-parser layout-analysis ai4science pdf-extractor-rag pdf-extractor-llm pdf-extractor-pretrain

Updated Apr 17, 2025
Python

OCRmyPDF

ocrmypdf / OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

python pdf ocr image-processing tesseract

Updated Apr 6, 2025
Python

paperless-ngx / paperless-ngx

A community-supported supercharged version of paperless: scan, index and archive all your physical documents

pdf machine-learning django angular ocr archiving dms document-management optical-character-recognition document-management-system

Updated Apr 18, 2025
Python

JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

python machine-learning information-retrieval data-mining ocr deep-learning image-processing cnn pytorch lstm optical-character-recognition crnn scene-text scene-text-recognition easyocr

Updated Sep 24, 2024
Python

LaTeX-OCR

lukas-blecher / LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

python machine-learning ocr latex deep-learning image-processing pytorch dataset transformer vit image2text im2text im2latex im2markup math-ocr vision-transformer latex-ocr

Updated Jan 18, 2025
Python

sml2h3 / ddddocr

带带弟弟通用验证码识别OCR pypi版

ocr captcha ddddocr

Updated Dec 30, 2024
Python

the-paperless-project / paperless

Scan, index, and archive all of your paper documents

search ocr paper archiving documents

Updated Apr 6, 2021
Python

zyddnys / manga-image-translator

Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/

ocr deep-learning neural-network anime machine-translation manga image-processing transformer chinese-translation text-detection auto-translation inpainting text-detection-recognition pytorch-implementation japanese-translations

Updated Apr 8, 2025
Python

YaoFANGUK / video-subtitle-extractor

视频硬字幕提取，生成srt文件。无需申请第三方API，本地实现文本识别。基于深度学习的视频字幕提取框架，包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

ocr deep-learning extract ripper subtitles srt subrip hardsub

Updated Feb 25, 2025
Python

PyMuPDF

pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

python pdf font data-science ocr tesseract epub mupdf text-processing pdf-documents extract-data table-extraction text-shaping xps pymupdf

Updated Apr 17, 2025
Python

adithya-s-k / omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

ocr parser-library web-crawler parse-server whisper-api ingestion-api vision-transformer omniparser

Updated Apr 9, 2025
Python

clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

nlp ocr computer-vision document-ai multimodal-pre-trained-model eccv-2022

Updated Jul 11, 2024
Python

chineseocr / chineseocr

yolo3+ocr

ocr idcard opencv-dnn yolo3 chinese-text-detect chinese-ocr darknet-text-detect trainticket

Updated Aug 29, 2022
Python

jonaswinkler / paperless-ng

A supercharged version of paperless: scan, index and archive all your physical documents

search machine-learning django angular ocr archiving full-text-search dms document-management-system

Updated Feb 14, 2023
Python

PaddlePaddle / PaddleX

All-in-One Development Tool based on PaddlePaddle

ocr time-series deployment speech-recognition classification segmentation object-detection ai-pipelines layout-detection formula-recognition pp-chatocr pdf2markdown

Updated Apr 18, 2025
Python

Layout-Parser / layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis

ocr computer-vision deep-learning object-detection document-image-processing layout-analysis document-layout-analysis detectron2 layout-parser layout-detection

Updated Aug 15, 2024
Python

doctr

mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

ocr deep-learning pytorch text-recognition text-detection optical-character-recognition text-detection-recognition tensorflow2 document-recognition

Updated Apr 15, 2025
Python

open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox

Updated Nov 27, 2024
Python

Improve this page

Add a description, image, and links to the ocr topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ocr topic, visit your repo's landing page and select "manage topics."