Detecting last column despite missing vertical border #1223
Replies: 2 comments 1 reply
-
I have the same issue: It is possible to use "vertical_strategy" : "text" as well but this will extract too many values for me. @jsvine do you know more about this? |
Beta Was this translation helpful? Give feedback.
-
You can also pass your own vertical lines with The simplest strategy may be to use the position of the right-most char on the page:
page.extract_table({
"explicit_vertical_lines": [max(page.chars, key=lambda char: char['x1'])['x1'] + 3]
}) If there are multiple tables a more robust approach is probably to use
This lets you use the table top/bottom lines to first crop out a page area and search only within that area for a specific marker. for table in page.find_tables():
_, top, _, bottom = table.bbox
crop = page.crop((page.bbox[0], top, page.bbox[2], bottom))
line = max(crop.chars, ...)['x1'] + 3
crop.extract_table({"explicit_vertical_lines": [line]}) |
Beta Was this translation helpful? Give feedback.
-
I am trying to get data from a PDF table that unfortunately has a missing border on the right side of the page as shown in the image below. This means that with my current code I cannot get the data from the last column.
I think there might be a way to only select the rows by changing the
table_settings
parameter of theextract_table()
method but I can't for the life of me figure out how.Currently my code looks as follows:
Would someone please be able to point me in the right direction?
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions