Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Tables of banded shaded rows with varying number of lines in row #882

Closed
ramakrse opened this issue May 7, 2023 · 0 comments
Closed
Labels

Comments

@ramakrse
Copy link

ramakrse commented May 7, 2023

Describe the bug

PDF has multiple tables across the documents. Tables are shaded/banded rows with varying lines in row

Code to reproduce the problem

Load the PDF file with pdfplumber

plumber_file = pdfplumber.open(pdf_file)
pdf_page = plumber_file.pages[29-1] #127 #67
im = pdf_page.to_image()

Table settings.

ts = {
"vertical_strategy": "lines",
"horizontal_strategy": "lines",
'intersection_tolerance': 32
}
im.debug_tablefinder(ts)

PDF file

Using the Public available pdf
https://www.mtu-solutions.com/content/dam/mtu/technical-information/operating-instructions/diesel/mtu-series-1600/marine/MS15029_01E.pdf/_jcr_content/renditions/original./MS15029_01E.pdf

Expected behavior

To identify the tables in each page properly. Here there are two tables

Actual behavior

playing with intersection_tolerance variable to handle more lines in a row, it detect one table, Space between tables also consider as row. Not able to detect two tables properly

Screenshots

image

Environment

  • Python 3.10.11
  • pdfplumber version: [e.g., 0.9.0]
    -Collab notebook

Additional context

Add any other context/notes about the problem here.

@ramakrse ramakrse added the bug label May 7, 2023
Repository owner locked and limited conversation to collaborators May 8, 2023
@jsvine jsvine converted this issue into discussion #884 May 8, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

1 participant