pdfplumber fails to extract tables from specific PDFs #1273

jongchan988 · 2025-02-02T07:51:20Z

jongchan988
Feb 2, 2025

Describe the bug

A clear and concise description of what the bug is.

Have you tried repairing the PDF?

Please try running your code with pdfplumber.open(..., repair=True) before submitting a bug report.

Code to reproduce the problem

Paste it here, or attach a Python file.

        with pdfplumber.open(documentPath) as pdf:
            for page in pdf.pages:
                tables = page.find_tables()
                print("tables.cnt: " + str(len(tables)))

PDF file

Please attach any PDFs necessary to reproduce the problem.

ocpp-1.6 edition 2.pdf

Expected behavior

What did you expect the result should have been?
detect this table

Actual behavior

What actually happened, instead?
It doesn't detect

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

pdfplumber version: 0.11.5
Python version: 3.12
OS: Window11

Additional context

Add any other context/notes about the problem here.

jsvine · 2025-02-11T04:02:30Z

jsvine
Feb 11, 2025
Maintainer

Hi @jongchan988, and thanks for your interest in this library. PDFs have no internal concept of a "table," and much of what we perceive to be a "table" is based on human perception. This is particularly true for tables that don't have graphical objects/borders separating their cells, like in this example.

There are various approaches to extracting tables like these, if you expect a similar structure every time. For example, you could use the positional information of the words in the FIELD NAME FIELD TYPE CARD. DESCRIPTION as input for the "explicit_vertical_lines": [...] table extraction settings.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdfplumber fails to extract tables from specific PDFs #1273

{{title}}

Replies: 1 comment

{{title}}

Select a reply

pdfplumber fails to extract tables from specific PDFs #1273

jongchan988 Feb 2, 2025

Describe the bug

Have you tried repairing the PDF?

Code to reproduce the problem

PDF file

Expected behavior

Actual behavior

Screenshots

Environment

Additional context

Replies: 1 comment

jsvine Feb 11, 2025 Maintainer

jongchan988
Feb 2, 2025

jsvine
Feb 11, 2025
Maintainer