This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Multiple Tables of banded shaded rows with varying number of lines in row #882
Labels
You can continue the conversation there. Go to discussion →
Describe the bug
PDF has multiple tables across the documents. Tables are shaded/banded rows with varying lines in row
Code to reproduce the problem
Load the PDF file with pdfplumber
plumber_file = pdfplumber.open(pdf_file)
pdf_page = plumber_file.pages[29-1] #127 #67
im = pdf_page.to_image()
Table settings.
ts = {
"vertical_strategy": "lines",
"horizontal_strategy": "lines",
'intersection_tolerance': 32
}
im.debug_tablefinder(ts)
PDF file
Using the Public available pdf
https://www.mtu-solutions.com/content/dam/mtu/technical-information/operating-instructions/diesel/mtu-series-1600/marine/MS15029_01E.pdf/_jcr_content/renditions/original./MS15029_01E.pdf
Expected behavior
To identify the tables in each page properly. Here there are two tables
Actual behavior
playing with intersection_tolerance variable to handle more lines in a row, it detect one table, Space between tables also consider as row. Not able to detect two tables properly
Screenshots
Environment
-Collab notebook
Additional context
Add any other context/notes about the problem here.
The text was updated successfully, but these errors were encountered: