-
Notifications
You must be signed in to change notification settings - Fork 6
Getting Started
The main dependencies needed for this code are included in the requirements.txt
.
However, here are the steps to install the dependencies not included in Anaconda.
I will show all steps on any OS
The first thing you will need is pip
as conda-forge
does not have all of the libraries that I have used.
On Mac:
sudo easy_install pip
# Or if using homebrew
brew install python
^ this will install python3.x as well as the pip installer
On Windows:
- Download get-pip.py to a folder on your computer. Open a command prompt window and navigate to the folder containing get-pip.py. Then run
python get-pip.py
. This will installpip
. - Verify a successful installation by opening a command prompt window and navigating to your Python installation's script directory (default is
C:\Python27\Scripts
). Typepip freeze
from this location to launch the Python interpreter. -
pip freeze
displays the version number of all modules installed in your Python non-standard library; On a fresh install, pip freeze probably won't have much info to show but we're more interested in any errors that might pop up here than the actual content. However, if you have Anaconda it will include all Anaconda packages.
sudo apt-get install python3-pip
Arch:
sudo pacman -S python-pip
CentOS:
sudo yum install python3 python3-wheel
Fedora:
sudo dnf install python3 python3-wheel
OpenSUSE:
sudo zypper install python3-pip python3-setuptools python3-wheel
Once you have pip installed you will be able to install PyPDF2, textract. All CLI will use the same commands with pip installer
.
pip install pypdf2 textract
The last thing is installing Tesseract OCR from Google which on some OS you cannot do from terminal/cmd prompt/powershell. For Windows go here for install instructions
Ubuntu/Debian:sudo apt-get install tesseract-ocr
sudo apt install libtesseract-dev
Homebrew:
brew install tesseract
MacOS:
sudo port install tesseract
Fedora:
sudo dnf config-manager --add-repo https://download.opensuse.org/repositories/home:Alexander_Pozdnyakov/Fedora_26/home:Alexander_Pozdnyakov.repo
sudo dnf install tesseract
sudo dnf install tesseract-langpack-deu
OpenSUSE Tumbleweed:
sudo zypper addrepo https://download.opensuse.org/repositories/home:Alexander_Pozdnyakov/openSUSE_Tumbleweed/home:Alexander_Pozdnyakov.repo
sudo zypper refresh
sudo zypper install tesseract-ocr
sudo zypper install tesseract-ocr-traineddata-german
OpenSUSE 15.0:
sudo zypper addrepo https://download.opensuse.org/repositories/home:Alexander_Pozdnyakov/openSUSE_Leap_15.0/home:Alexander_Pozdnyakov.repo
sudo zypper refresh
sudo zypper install tesseract-ocr
sudo zypper install tesseract-ocr-traineddata-german
CentOS:
sudo
yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/
sudo rpm --import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key`
yum update
yum install tesseract
yum install tesseract-langpack-deu
If you are using Snap Package manager:
sudo snap install --channel=edge tesseract
Depending on packages you may or may not have installed you may be missing some dependencies, the most common are sphinx
you can check here for help with any issues.
With that you should be all set up to run this script.
How do I run this script?
As long as you have followed the above steps you can simply open terminal/cmd prompt/powershell and run:
git clone https://github.com/KaynRyu/semesterProject/
cd ./semesterProject/
python IDTtoJSON.py