Skip to content

Latest commit

 

History

History
65 lines (37 loc) · 1.87 KB

README.md

File metadata and controls

65 lines (37 loc) · 1.87 KB

OCR Search Engine

Open Source Love

Introduction

This is the source code for the developed OCR (Optical Character Recognition) Search Engine which is an attempt to make an Information Retrieval and Extraction (IRE) system that replicates the current state-of-the-art methods using the IRE and basic Natural Language Processing (NLP) techniques. In this project we have tried to demonstrate the study of the methods that are being used for performing search and retrieval tasks. We also present the small descriptions of the functionalities supported in our system along with the statistics of the dataset. We use Indic-OCR developed at CVIT, for generating the text for the OCR Search Engine.

![Watch the video]

Collaborators

  • Developed at : Centre for Visual Information and Technology

Thanks to these organisations for providing the data :

  • National Digital Library of India, IIT Kharagpur
  • IIIT Hyderabad
  • British Library, UK

Fork this repo

Fork this repo by clicking on the top of the repository, which will create a copy in your github account

Clone this repository

After forking, clone the repository and open a terminal and run the following git command:

git clone https://github.com/username/cvitsearch-se.git

Link to the project website details (temporary)

Link : http://preon.iiit.ac.in:3000/ [public-temporarily]

Requirements

These are the required packages for this repository to run

* python3.x
* django
* numpy
* scipy
* psql
* elasticsearch-dsl
* elasticsearch
* jinja

You can find all the requirements within the requirements.txt. To install, create a python3.x virtual environment and run :

pip -r install requirements.txt

This will install the packages in the environment.