Issues when decrypting a PDF with empty metadata values #766
Labels
component:document
Related to PDFDocument
status: accepted
type:anomaly
Errors caused by deviations from the PDF Reference
I am trying to run
extract_text
against an encrypted pdf (password is simply the default PASSWORD_PADDING) and get the following traceback:Digging in I realized that this happens when decrypting the metadata; most notably the (encrypted) metadata looks like this (added print statements):
As you can see
Keywords
is an empty string (the same goes forSubject
).Now I do not know enough about the PDF specification and cannot comment on whether this is allowed or not (ie should those empty keys be there at all etc) but the error is rather clear now. The
IV
is taken from the first 16 bytes of data and in this case there is nothing there. One fix is:Would this be an acceptable fix for you? If yes I could prepare a PR to fix this.
Thank you for your work on pdfminer!
The text was updated successfully, but these errors were encountered: