Skip to content

mubasharkk/fastapi_ocr

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About project


The project is an implementation of a microservice for reading text from images, powered by Tesseract OCR, that can be easily incorporated in any application via a simple-to-use API built with FastAPI. The whole microservice is containerized using Docker, making it easier for anyone to set up a local copy and bend it to their needs.

The microservice also cleans and processes the uploaded images with OpenCV; improving the OCR predictions of the Tesseract model.

Built With

Run locally


Clone the repo

git clone https://github.com/mubasharkk/fastapi_ocr.git

Install dependencies

pip install -r requirements.txt

Run Server

cd app
uvicorn pdfapi:app --host 0.0.0.0 --port 8000 --reload

Run on Docker

docker build -t fastapi_ocr .   

Run the docker container

docker run -d --name my_container api_ocr 

Documentation


The api contains the following endpoints

  • /extract_text - returns text from uploaded file

  • /extract_text_from_many_files - return text from all uploades files

  • /extract_text_from_url - return text from url with image

Running Tests

cd /var/www/app

export PYTHONPATH=$PWD

pytest -q tests

About

webapp tesseract OCT tool

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 79.1%
  • HTML 15.6%
  • Dockerfile 3.9%
  • Shell 1.4%