PdfTableExtraction

Description

This py script can extract table from pdf with multiple pages contaning clinical data where the keys are same in each table but values differ (will not work on scanned pdfs) and convert into xlsx for easy viewing and other manipulations.

Installation:

python3.x
XlsxWriter for python3
poppler or poppler-utils containing pdftotext (depending on distro)
OS: GNU/Linux

Usage:

run PdfTableExtraction.py from the folder containing the PDFs in the example format. For other formats, adjustments needs to be done. For simplicity, provide the starting and end text of first column.

eg: python3 PdfTableExtraction.py ALL.pdf Disease Note

The result will be a .txt file of same name as pdf. This file helps us to cross check the extraction and to do necessary changes if needed on data change

Second file is .xlsx with same name as pdf. This is the final output.

License

GPLv3

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
PdfTableExtraction.py		PdfTableExtraction.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PdfTableExtraction

Description

Installation:

Usage:

License

About

Releases

Packages

Languages

DustBytes/PdfTableExtraction

Folders and files

Latest commit

History

Repository files navigation

PdfTableExtraction

Description

Installation:

Usage:

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages