Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate clean metadata by default when creating (and stamping) documents #179

Merged
merged 2 commits into from
Apr 9, 2019

Conversation

albfernandez
Copy link
Contributor

We are using a patched version of Itext 2.1.7 which generates empty metadata by default.
All metadata fields, including producer and Creation and Modification Dates are empty.

If we like to include some metadata, we can provide it calling the appropriate methods.

If we stamp a pdf file, we strip all the metadata, and only put the metadata provided via setInfoDictionary() method (if any)

I provide the pull request so we can switch to use OpenPDF once included.

It's open for discussion, maybe it would be good idea to keep "producer" , "creationDate" and "modifiedDate" and only remove those fields on demand... but here is what we are using now

@andreasrosdal

This comment has been minimized.

@albfernandez
Copy link
Contributor Author

upssss, Unused imports... It's fixed now

@andreasrosdal andreasrosdal merged commit e7b8fc1 into LibrePDF:master Apr 9, 2019
@andreasrosdal
Copy link
Contributor

Thanks for this contribution to OpenPDF. I will create a new release tomorrow.

@andreasrosdal
Copy link
Contributor

For science, what are you using iText (and soon OpenPDF) for?

@albfernandez
Copy link
Contributor Author

many uses ....

  • Create pdfs with Jasperreport
  • Create pdfs from java code.
  • Overlay text over existing documents (confidential, invalid, draft...)
  • Remove metadata in documents uploaded by users before publishing. (with this patch)
  • Generate Barcodes.
  • Joining multiple pdfs and images into one PDF.
  • Populate Acrofiedls from xml data to generate new PDF

@andreasrosdal
Copy link
Contributor

OpenPDF 1.2.16 has been released: https://github.com/LibrePDF/OpenPDF/releases/tag/1.2.16

@albfernandez
Copy link
Contributor Author

Thank you very much.
We've started some testing today.

@andreasrosdal
Copy link
Contributor

@albfernandez How did the testing go?

@albfernandez
Copy link
Contributor Author

We've done some changes to our code, but we're now stopped until May.

It would be a good idea to write a "upgrading from itext" section in the wiki, for example explaining with examples how to deal with removed classes as TiffImage

albfernandez/joinpdf@48ef2df
albfernandez/joinpdf@29d3a37

I'm not good in English, but I can provide info and examples about all changes we will be doing in our code

@ngs-software
Copy link

We have been experiencing issues when signing documents with keywords/metadata previously injected in a document, as these become inaccessible through API after document is stamped.

As far as we have seen, this is due to this pull request, as it make keywords hidden when stamping, removing references to existing kewords in the document (event if the actual keywords content is still present inside it - can be seen with a regular text editor).

We'd like to know what is the purpose of this modification, and if it responds to any compliance or standard rule we don't know about. From our point of view, metadata cleanup should be offered and some kind of utility helper/class to be used only when required as part of external business logic requirements, and not as a default behavior, as it may be required later for multiple purposes.

We include for your examination 3 pdf files:

  • Original File with metadata keywords added to it, that can be seen from Reader application.
  • File resulting from signing original file through a unit test with OpenPDF 1.2.15, where metadata is still visible from Reader application.
  • File resulting from signing original file through the same unit test with OpenPDF 1.2.16, where metadata disappears from Reader application (but can be seen in document through text editor).

From this analysis, we believe that this modification should be rolled-back and 'refactored' to a helper method that, by the way, should perform a complete cleanup of keywords, in case that's the actual requirement, as in current approach, apart of deleting existing information of original document without request from outer application, it only makes it hidden (but still present in pdf's raw contents).

Looking forward to your comments and considerations,

Best Regards,
ngs

signed-with-1.2.15.pdf
signed-with-1.2.16.pdf
original.pdf

@albfernandez
Copy link
Contributor Author

There is a bug in metadata cleanup as you note, the data it's already on the file (hidden, but accessible via text editor).

The purpose of this patch is to generate files which doesn't leak sensible information (user names, host names, etc) in the metadata, so i think it's safe to clean this metadata by default, and add it if you need (for indexing as is in your case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants