Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read from public files #4

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Erpnext Ocr
## Erpnext OCR

OCR
OCR with [tesseract](https://github.com/tesseract-ocr/tesseract).

#### License

Expand All @@ -22,21 +22,25 @@ Examples to implement OCR(Optical Character Recognition) using tesseract using P
```
sudo apt-get install tesseract-ocr
```
- Install python binding for tesseract, pytesseract, using this pip command:
- Install python binding for tesseract, [pytesseract](https://pypi.org/project/pytesseract/), using this pip command:
```
pip install pytesseract
```
- Install image processing library in python, pillow using this pip command:
- Install image processing library in python, [pillow](https://pypi.org/project/Pillow/), using this pip command:
```
pip install pillow
```
- Install HTTP library in python, [requests](https://pypi.org/project/requests/) using this pip command:
```
pip install requests
```

**For working with pdf files:**
- Install imagemagick using this command:
```
sudo apt-get install imagemagick
```
- Install python binding for imagemagick, wand, using this pip command:
- Install python binding for imagemagick, [wand](https://pypi.org/project/Wand/), using this pip command:
```
pip install wand
```
18 changes: 17 additions & 1 deletion erpnext_ocr/erpnext_ocr/doctype/ocr_read/ocr_read.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from __future__ import unicode_literals
import frappe
from frappe.model.document import Document
import os

#Alternative to "File Upload Disconnected. Please try again."

Expand Down Expand Up @@ -40,9 +41,24 @@ def force_attach_file_doc(filename,name):
class OCRRead(Document):
def read_image(self):
from PIL import Image
import requests
import pytesseract

fullpath = frappe.get_site_path() + self.file_to_read
path = self.file_to_read

if path.startswith('/assets/'):
# from public folder
fullpath = os.path.abspath(path)
elif path.startswith('/files/'):
# public file
fullpath = frappe.get_site_path() + '/public' + path
elif path.startswith('/private/files/'):
# private file
fullpath = frappe.get_site_path() + path
else:
# external link
fullpath = requests.get(path, stream=True).raw

im = Image.open(fullpath)

text = pytesseract.image_to_string(im, lang='eng')
Expand Down
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
frappe
frappe
requests