extract-text-from-pdf

Here are 2 public repositories matching this topic...

sxaxmz / handle_scanned_pdf

A wrapper on top of python-OCR tools such as pytesseract and easyocr, to recognize and extract text embedded in images. Also, convert scanned-PDFs to text searchable PDFs.

tesseract-ocr pytesseract ocr-python scanned-image-pdfs searchable-pdf easyocr scanned-pdf-documents extract-text-from-image extract-text-from-pdf

Updated Jul 6, 2024
Python

This assignment was done as part of the COP290 course requirements. This project is designed to parse text from various media types: audio (.wav), video (.mp4), and text documents (.pdf). The implementation utilizes Python and its libraries, relying exclusively on free APIs and libraries for unlimited usage.

text-extraction extract-text-from-pdf extract-text-from-audio extract-text-from-video

Updated Dec 5, 2024
Python

Improve this page

Add a description, image, and links to the extract-text-from-pdf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the extract-text-from-pdf topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extract-text-from-pdf

Here are 2 public repositories matching this topic...

sxaxmz / handle_scanned_pdf

jahnabiroy / Text-Extractor

Improve this page

Add this topic to your repo