⚗️ Experimental Frappe OCR application with tesseract.
This project is a fork of ERPNext-OCR by John Vincent Fiel. Its aim is to fix and cleanup the original source code and add some new features.
Check out more on ERPNext Discuss.
See CHANGELOG
See Taiga.io
Install tesseract-ocr, plus imagemagick and ghostscript (to work with pdf files) using this command on Debian:
sudo apt-get install tesseract-ocr imagemagick libmagickwand-dev ghostscript
bench get-app --branch develop erpnext_ocr https://github.com/Monogramm/erpnext_ocr
bench install-app erpnext_ocr
When installing Frappe app, the following python requirements will be installed:
-
python binding for tesseract, tesserocr
-
image processing library in python, pillow
-
HTTP library in python, requests
-
python binding for imagemagick, wand
File Being Read:
Sample Screenshot:
In order to use OCR with different languages, you need to install the appropriate trained data files. Check tesseract Wiki for details: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files
If you wish to develop or just test locally this application, you can use docker-compose up -d
at the root of the this repository.
You can then access your ERPNext OCR dev env at http://localhost:8080
.
-
wand.exceptions.PolicyError: not authorized '/opt/sample.pdf' @ error/constitute.c/ReadImage/412
-
This can happen due to security configuration in imagemagick preventing it to read PDF files.
-
Reference:
-
-
wand.exceptions.WandRuntimeError: MagickReadImage returns false, but did raise ImageMagick exception. This can occurs when a delegate is missing, or returns EXIT_SUCCESS without generating a raster.
-
This might happen if you're missing a dependency to convert PDF, most of the time
ghostscript
-
References:
-
-
OSError: encoder error -2 when writing image file
- This might happen when trying to open a TIFF image, but the real error is "hidden" and only displayed in console.
- If the original error in console is
Fax3SetupState: Bits/sample must be 1 for Group 3/4 encoding/decoding.
that usually happens when TIFF image compression is not valid / recognized.
bench run-tests --app erpnext_ocr
Monogramm
- Website: https://www.monogramm.io
- Github: @Monogramm
John Vincent Fiel
- Github: @jvfiel
Contributions, issues and feature requests are welcome!
Feel free to check issues page.
Check the contributing guide.
Give a ⭐ if this project helped you!
Copyright © 2019 Monogramm.
This project is MIT licensed.
This README was generated with ❤️ by readme-md-generator