Skip to content

Getting Started

Brett Plemons edited this page Apr 26, 2019 · 3 revisions

Setup

The main dependencies needed for this code are included in the requirements.txt. However, here are the steps to install the dependencies not included in Anaconda. I will show all steps on any OS The first thing you will need is pip as conda-forge does not have all of the libraries that I have used. On Mac:

sudo easy_install pip
# Or if using homebrew
brew install python 
^ this will install python3.x as well as the pip installer

On Windows:

  • Download get-pip.py to a folder on your computer. Open a command prompt window and navigate to the folder containing get-pip.py. Then run python get-pip.py. This will install pip.
  • Verify a successful installation by opening a command prompt window and navigating to your Python installation's script directory (default is C:\Python27\Scripts). Type pip freeze from this location to launch the Python interpreter.
  • pip freeze displays the version number of all modules installed in your Python non-standard library; On a fresh install, pip freeze probably won't have much info to show but we're more interested in any errors that might pop up here than the actual content. However, if you have Anaconda it will include all Anaconda packages.
On Linux: Debian/Ubuntu:
sudo apt-get install python3-pip

Arch:

sudo pacman -S python-pip

CentOS:

sudo yum install python3 python3-wheel

Fedora:

sudo dnf install python3 python3-wheel

OpenSUSE:

sudo zypper install python3-pip python3-setuptools python3-wheel

Once you have pip installed you will be able to install PyPDF2, textract. All CLI will use the same commands with pip installer.

pip install pypdf2 textract

The last thing is installing Tesseract OCR from Google which on some OS you cannot do from terminal/cmd prompt/powershell. For Windows go here for install instructions

Ubuntu/Debian:
sudo apt-get install tesseract-ocr
sudo apt install libtesseract-dev

Homebrew:

brew install tesseract

MacOS:

sudo port install tesseract

Fedora:

sudo dnf config-manager --add-repo https://download.opensuse.org/repositories/home:Alexander_Pozdnyakov/Fedora_26/home:Alexander_Pozdnyakov.repo
sudo dnf install tesseract
sudo dnf install tesseract-langpack-deu

OpenSUSE Tumbleweed:

sudo zypper addrepo https://download.opensuse.org/repositories/home:Alexander_Pozdnyakov/openSUSE_Tumbleweed/home:Alexander_Pozdnyakov.repo
sudo zypper refresh
sudo zypper install tesseract-ocr
sudo zypper install tesseract-ocr-traineddata-german

OpenSUSE 15.0:

sudo zypper addrepo https://download.opensuse.org/repositories/home:Alexander_Pozdnyakov/openSUSE_Leap_15.0/home:Alexander_Pozdnyakov.repo
sudo zypper refresh
sudo zypper install tesseract-ocr
sudo zypper install tesseract-ocr-traineddata-german

CentOS:

sudo
yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/
sudo rpm --import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key`
yum update
yum install tesseract
yum install tesseract-langpack-deu

If you are using Snap Package manager:

sudo snap install --channel=edge tesseract

Depending on packages you may or may not have installed you may be missing some dependencies, the most common are sphinx you can check here for help with any issues. With that you should be all set up to run this script. How do I run this script? As long as you have followed the above steps you can simply open terminal/cmd prompt/powershell and run:

git clone https://github.com/KaynRyu/semesterProject/
cd ./semesterProject/
python IDTtoJSON.py
Clone this wiki locally