Errors with table identification in PDF (false positives) #1227
kcbz
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment
-
Hi @kcbz, this doesn't appear to be a bug, but rather a tricky aspect of this PDF, which is that it contains some not-visible rects: import pdfplumber
pdf = pdfplumber.open("emeryville-ca-TITLE_10_TIDELANDS.pdf")
page = pdf.pages[0]
im = page.to_image()
im.draw_rects(page.rects) In cases like these, I recommend examining the To ignore those rects, you can use the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I believe this is a bug, but I have a PDF of only text and for every page of the PDF, pdfplumber believes that the entire content of each page belongs to a table. For my purposes I need tables and text to identify components of the document correctly and separately (I know text captures table data but I created a work around for this). I've never really seen this reverse case where pdfplumber thinks there are tables when there are not any.
I have tried playing with the table_settings, but this didn't fix the issue, I also tried using the debug_tablefinder() and it seems to confirm that it thinks the contents of every page is a table.
I attached the PDF below:
emeryville-ca-TITLE_10_TIDELANDS.pdf
Beta Was this translation helpful? Give feedback.
All reactions