GitHub - Yakub-Egamnazarov/OCR-Text-recognition-data-base: Pytesseract, OpenCv, and bunch of other packages used for extracting text, and preparing the DB in excel sheet

This project is created to extract company informataion from pictures and place the text data to excel DataBase

It implements the power of python with support of some third party libraries:

1. pytesseract | for python library for tesseract OCR - google
2. 0penCV | python computer vision library 
3. pyheif | for converting the pictures from IOS based to general jpg format, at the same time cropping in the part of onterest
4. openpyxl | for manipulating excel file with python
5. re | Regular expression library for python - for parsing the data

Program logic:

All the images placed in the img_heic
all the programming logic is placed in pytesseract/main.py folder.
when run it automatically
- scans the img_heic folder for raw images (.heac)
- converts it into jpg format through pyheif library
- saves all the pictures files into the img_cropped folder
- scans all the cropped pictures with pytesseract and OpenCV
- extracts the data and draws boxes and saves pictures in the folder img_boxed
- all extracted data is saved in csv file format in output-txt folder
- scans output-txt folder for text files (.csv) and parses it for certain setup data units
- all the data saved in one dictionary and saved in excel file in output_data/customer_list-0.xlsx
delete.py file deletes all the created files and erases data from the final excel file.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
output_data		output_data
pytesseract		pytesseract
pytesseract_test		pytesseract_test
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Yakub-Egamnazarov/OCR-Text-recognition-data-base

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages