Unknown widths when reading PDFs? #1714
Replies: 3 comments 1 reply
-
Can you share the file where you are facing this issue please |
Beta Was this translation helpful? Give feedback.
1 reply
-
In case you just want to get rid of those messages https://pypdf.readthedocs.io/en/latest/user/suppress-warnings.html |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks for reporting You may have a look at the PR you may be able to apply the mod as a patch if you need to go quickly |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all, I'm trying to write a script that will let me find every instance of a term (from a list of terms) in each PDF in a directory. To do this, I'm using the pypdf, re, os, and glob packages, co-opting code from this post. The code (below) needs some refining so that it outputs something I can actually use, but it is otherwise working as intended. However, when it reaches one PDF, it prints out a series of messages like the following:
unknown widths : [0, IndirectObject(261, 0, 2529565096976)]
I've looked through the pypdf/pypdf2 documentation, stack overflow, and this github for details on what this means, but haven't found a clear answer. When I used
print(file)
, the terminal spat out most of the PDF, interspersed with the above messages. I should also note that the PDF in question is computer-generated (i.e., I can copy text from it), is only 35 pages long, and does not contain my search terms.Can anyone help me understand what this message means? Ultimately I'd like to take the data this code generates and use it to build a dataframe, and I suspect this message may interfere with that. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions