This is the source code for the developed OCR (Optical Character Recognition) Search Engine which is an attempt to make an Information Retrieval and Extraction (IRE) system that replicates the current state-of-the-art methods using the IRE and basic Natural Language Processing (NLP) techniques. In this project we have tried to demonstrate the study of the methods that are being used for performing search and retrieval tasks. We also present the small descriptions of the functionalities supported in our system along with the statistics of the dataset. We use Indic-OCR developed at CVIT, for generating the text for the OCR Search Engine.
- Developed at : Centre for Visual Information and Technology
Thanks to these organisations for providing the data :
- National Digital Library of India, IIT Kharagpur
- IIIT Hyderabad
- British Library, UK
Fork this repo by clicking on the top of the repository, which will create a copy in your github account
After forking, clone the repository and open a terminal and run the following git command:
git clone https://github.com/username/cvitsearch-se.git
Link : http://preon.iiit.ac.in:3000/ [public-temporarily]
These are the required packages for this repository to run
* python3.x
* django
* numpy
* scipy
* psql
* elasticsearch-dsl
* elasticsearch
* jinja
You can find all the requirements within the requirements.txt. To install, create a python3.x virtual environment and run :
pip -r install requirements.txt
This will install the packages in the environment.