Here are
2,393 public repositories
matching this topic...
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Updated
Apr 17, 2025
Python
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
Updated
Mar 26, 2025
Python
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Updated
Apr 17, 2025
Python
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Updated
Apr 6, 2025
Python
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
Updated
Apr 18, 2025
Python
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Updated
Sep 24, 2024
Python
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Updated
Jan 18, 2025
Python
Updated
Dec 30, 2024
Python
Scan, index, and archive all of your paper documents
Updated
Apr 6, 2021
Python
Updated
Apr 8, 2025
Python
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
Updated
Feb 25, 2025
Python
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Updated
Apr 17, 2025
Python
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Updated
Apr 9, 2025
Python
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Updated
Jul 11, 2024
Python
Updated
Aug 29, 2022
Python
A supercharged version of paperless: scan, index and archive all your physical documents
Updated
Feb 14, 2023
Python
All-in-One Development Tool based on PaddlePaddle
Updated
Apr 18, 2025
Python
A Unified Toolkit for Deep Learning Based Document Image Analysis
Updated
Aug 15, 2024
Python
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
Updated
Apr 15, 2025
Python
OpenMMLab Text Detection, Recognition and Understanding Toolbox
Updated
Nov 27, 2024
Python
Improve this page
Add a description, image, and links to the
ocr
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
ocr
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.