pdfplumber fails to extract tables from specific PDFs #1273
jongchan988
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment
-
Hi @jongchan988, and thanks for your interest in this library. PDFs have no internal concept of a "table," and much of what we perceive to be a "table" is based on human perception. This is particularly true for tables that don't have graphical objects/borders separating their cells, like in this example. There are various approaches to extracting tables like these, if you expect a similar structure every time. For example, you could use the positional information of the words in the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Describe the bug
A clear and concise description of what the bug is.
Have you tried repairing the PDF?
Please try running your code with
pdfplumber.open(..., repair=True)
before submitting a bug report.Code to reproduce the problem
Paste it here, or attach a Python file.
PDF file
Please attach any PDFs necessary to reproduce the problem.
ocpp-1.6 edition 2.pdf
Expected behavior
What did you expect the result should have been?
data:image/s3,"s3://crabby-images/8a871/8a871d4313082ab070b8d119bed2cbb4194d30c9" alt="Image"
detect this table
Actual behavior
What actually happened, instead?
It doesn't detect
Screenshots
If applicable, add screenshots to help explain your problem.
Environment
Additional context
Add any other context/notes about the problem here.
Beta Was this translation helpful? Give feedback.
All reactions