OCR Capstone Project

Setup

In theory, run WindowsSetup.bat to get everything set up for Windows
For Linux or Mac, WindowsSetup.bat contains a list of all dependencies that are needed to be installed.
This program has only been successfully run on Linux. This is due to a depenency with pdf2images not working on Windows

To Run

Call python3 extractText.py
Once the GUI is up, select the template file, the output csv file (which will be overwritten) and the PDF file(s) to be scanned in and select run.

Template File

The template file is | seperated fields where
The first field is the column header
After that there are three options:

There are no other fields if nothing is to be inserted for a specific column
There can be a second field with random text if the column is to always be filled with the same text
There are four additional fields that specifies the bounding box where the code will extract text from the PDFs:
- The first additional field is the top left x coordinate of the bounding box
- The second additional field is the top left y coordinate of the bounding box
- The third additional field is the bottom right x coordinate of the bounding box
- The fourth additional field is the bottom right y coordinate of the bounding box

Future Work

Needs to be able to run on Windows (possibly by changing the pdf2image library to a different library?)
Needs to be able to create a template with a GUI
Needs to be able to detect and correct for skewing in the PDFs

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
examples		examples
Example.pdf		Example.pdf
OU_CS_CheatSheet.xlsx		OU_CS_CheatSheet.xlsx
OU_CS_EoBs_redacted-2.pdf		OU_CS_EoBs_redacted-2.pdf
README.md		README.md
Solutions101_OU_CS_Additional_EoBs.pdf		Solutions101_OU_CS_Additional_EoBs.pdf
WindowsSetup.bat		WindowsSetup.bat
config.ini		config.ini
extractText.py		extractText.py
testTemplate.txt		testTemplate.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OCR Capstone Project

Setup

To Run

Template File

Future Work

About

Uh oh!

Releases

Packages

Contributors 7

Uh oh!

Languages

jallendev/CS4273OCR

Folders and files

Latest commit

History

Repository files navigation

OCR Capstone Project

Setup

To Run

Template File

Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Uh oh!

Languages

Packages