GitHub - jacobmarks/pytesseract-ocr-plugin: Run optical character recognition with PyTesseract from the FiftyOne App!

PyTesseract Optical Character Recognition Plugin

Updates

2023-10-19: Added support for customizing prediction fields, and embedded field for OCR text.

This plugin is a Python plugin that allows you to perform optical character recognition on documents using PyTesseract — the Python bindings for the Tesseract OCR engine!

Watch On Youtube

Installation

fiftyone plugins download https://github.com/jacobmarks/pytesseract-ocr-plugin

You will also need to install the plugin's requirements:

pip install -r requirements.txt

Operators

`run_ocr_engine`

Runs the PyTesseract OCR engine on the documents in the dataset, converts the results to FiftyOne labels, and stores individual word predictions as well as block-level predictions on the dataset.

Usage

You can access the operator via the App's action menu, or by pressing the "`" key on your keyboard and selecting the operator from the dropdown menu.

If you have a view loaded and/or samples selected, the operator will give you the option to run the OCR engine on only those samples or on the entire dataset.

You can either choose to run the operator in the foreground, or to delegate the execution of the operator to a background job.

💡 Once you've generated OCR predictions, you can search through them using the Keyword Search plugin!

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
__init__.py		__init__.py
fiftyone.yml		fiftyone.yml
ocr_engine.py		ocr_engine.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTesseract Optical Character Recognition Plugin

Updates

Watch On Youtube

Installation

Operators

`run_ocr_engine`

Usage

About

Releases

Packages

Languages

jacobmarks/pytesseract-ocr-plugin

Folders and files

Latest commit

History

Repository files navigation

PyTesseract Optical Character Recognition Plugin

Updates

Watch On Youtube

Installation

Operators

run_ocr_engine

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`run_ocr_engine`

Packages