-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate clean metadata by default when creating (and stamping) documents #179
Conversation
This comment has been minimized.
This comment has been minimized.
upssss, Unused imports... It's fixed now |
Thanks for this contribution to OpenPDF. I will create a new release tomorrow. |
For science, what are you using iText (and soon OpenPDF) for? |
many uses ....
|
OpenPDF 1.2.16 has been released: https://github.com/LibrePDF/OpenPDF/releases/tag/1.2.16 |
Thank you very much. |
@albfernandez How did the testing go? |
We've done some changes to our code, but we're now stopped until May. It would be a good idea to write a "upgrading from itext" section in the wiki, for example explaining with examples how to deal with removed classes as TiffImage albfernandez/joinpdf@48ef2df I'm not good in English, but I can provide info and examples about all changes we will be doing in our code |
We have been experiencing issues when signing documents with keywords/metadata previously injected in a document, as these become inaccessible through API after document is stamped. As far as we have seen, this is due to this pull request, as it make keywords hidden when stamping, removing references to existing kewords in the document (event if the actual keywords content is still present inside it - can be seen with a regular text editor). We'd like to know what is the purpose of this modification, and if it responds to any compliance or standard rule we don't know about. From our point of view, metadata cleanup should be offered and some kind of utility helper/class to be used only when required as part of external business logic requirements, and not as a default behavior, as it may be required later for multiple purposes. We include for your examination 3 pdf files:
From this analysis, we believe that this modification should be rolled-back and 'refactored' to a helper method that, by the way, should perform a complete cleanup of keywords, in case that's the actual requirement, as in current approach, apart of deleting existing information of original document without request from outer application, it only makes it hidden (but still present in pdf's raw contents). Looking forward to your comments and considerations, Best Regards, |
There is a bug in metadata cleanup as you note, the data it's already on the file (hidden, but accessible via text editor). The purpose of this patch is to generate files which doesn't leak sensible information (user names, host names, etc) in the metadata, so i think it's safe to clean this metadata by default, and add it if you need (for indexing as is in your case). |
We are using a patched version of Itext 2.1.7 which generates empty metadata by default.
All metadata fields, including producer and Creation and Modification Dates are empty.
If we like to include some metadata, we can provide it calling the appropriate methods.
If we stamp a pdf file, we strip all the metadata, and only put the metadata provided via setInfoDictionary() method (if any)
I provide the pull request so we can switch to use OpenPDF once included.
It's open for discussion, maybe it would be good idea to keep "producer" , "creationDate" and "modifiedDate" and only remove those fields on demand... but here is what we are using now