Tesseract-OCR-5-Docker 📜

Docker Image with latest Tesseract OCR Version 5.x.x built from sources.

The sources are pulled from the latest main branch and latest releases of the Tesseract OCR project.

Docker Hub: https://hub.docker.com/r/franky1/tesseract

Usage 🛠️

Pull Docker Image ⌨️

Pull the docker image from Docker Hub:

docker pull franky1/tesseract

Run Docker Container ⌨️

Mount your image data to the /tmp directory and run Tesseract OCR container with the required command line options, for example, run Tesseract OCR container with test image:

docker run -it -v ${PWD}/testdata:/tmp --rm franky1/tesseract \
  tesseract english.png output --oem 1 -l eng

For the Tesseract command line options, please refer to the Tesseract Manual

Mount more languages 🗣️

Test if the mounted languages from your local subfolder /tessdata are available in the Docker container. Be aware that the local languages overwrite the installed languages in the Docker image. Example here with french language:

docker run -it -v ${PWD}/testdata:/tmp \
  -v ${PWD}/tessdata:/usr/local/share/tessdata/ \
  --rm franky1/tesseract

Test the mounted languages in the Docker container with a sample image. Example here with french language:

docker run -it -v ${PWD}/testdata:/tmp \
  -v ${PWD}/tessdata:/usr/local/share/tessdata/ \
  --rm franky1/tesseract \
  tesseract french.jpg output --oem 1 -l fra

Alternatively, you can build a new Docker image if you want other languages, see next section.

Build Docker Image yourself 🐳

For details have a look into the Dockerfile.

Git clone this repo.
Add your required languages to the languages.txt file.
(a) Build the docker image from scratch, if you want the latest sources from the main branch.

docker build --tag tesseract .

(b) Build the docker image from scratch, if you want a specific release version.

docker build --tag tesseract --build-arg TESSERACT_VERSION=5.0.0 .

Run Tesseract OCR container with test image:

docker run -it --name tesseract -v ${PWD}/testdata:/tmp --rm \
  tesseract tesseract english.png output --oem 1 -l eng

Image conditions ☑️

Only supported target for this docker image currently is linux/amd64.
Working directory for ocr images is /tmp inside the container. See example above.
Directory for trained data is /usr/local/share/tessdata/ inside the container. See example above.
This image was built without the Tesseract training tools.
This image currently includes only the following languages:
- English: tessdata_best > eng.traineddata
- German: tessdata_best > deu.traineddata
- If you need other languages, you have to build your own image or mount trained data to the /usr/local/share/tessdata/ directory. See example above.

Tesseract Trained Data for all available languages 🏋️

Overview of supported languages https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
Trained models with support for legacy and LSTM OCR engine https://github.com/tesseract-ocr/tessdata
Fast integer versions of trained LSTM models https://github.com/tesseract-ocr/tessdata_fast
Best (most accurate) trained LSTM models https://github.com/tesseract-ocr/tessdata_best

Further documentation 🔗

Docker Hub: https://hub.docker.com/repository/docker/franky1/tesseract
Original Tesseract Github Repository: https://github.com/tesseract-ocr/tesseract
Original Tesseract Documentation: https://tesseract-ocr.github.io/
Original Tesseract Manual: https://tesseract-ocr.github.io/tessdoc/
More tessdata_best languages: https://github.com/tesseract-ocr/tessdata_best

ToDo ✅

Issues 🐛

If you have any bugs or requests regarding this Docker image, please post an issue in this Github Repository.

Project status ✔️

27.07.2022: Docker Image is ready for usage, still some slight improvements possible, sometimes build issues

Name		Name	Last commit message	Last commit date
Latest commit History 370 Commits
.github		.github
tessdata		tessdata
testdata		testdata
.dockerignore		.dockerignore
.gitignore		.gitignore
Compiling-GitInstallation.md		Compiling-GitInstallation.md
Docker-Hub-Description.md		Docker-Hub-Description.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
VERSION.ini		VERSION.ini
get-languages.sh		get-languages.sh
languages.txt		languages.txt
requirements.txt		requirements.txt
versioncheck.py		versioncheck.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tesseract-OCR-5-Docker 📜

Usage 🛠️

Pull Docker Image ⌨️

Run Docker Container ⌨️

Mount more languages 🗣️

Build Docker Image yourself 🐳

Image conditions ☑️

Tesseract Trained Data for all available languages 🏋️

Further documentation 🔗

ToDo ✅

Issues 🐛

Project status ✔️

About

Contributors 3

Languages

License

Franky1/Tesseract-OCR-5-Docker

Folders and files

Latest commit

History

Repository files navigation

Tesseract-OCR-5-Docker 📜

Usage 🛠️

Pull Docker Image ⌨️

Run Docker Container ⌨️

Mount more languages 🗣️

Build Docker Image yourself 🐳

Image conditions ☑️

Tesseract Trained Data for all available languages 🏋️

Further documentation 🔗

ToDo ✅

Issues 🐛

Project status ✔️

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

Languages