Generate clean metadata by default when creating (and stamping) documents #179

albfernandez · 2019-04-09T17:34:45Z

We are using a patched version of Itext 2.1.7 which generates empty metadata by default.
All metadata fields, including producer and Creation and Modification Dates are empty.

If we like to include some metadata, we can provide it calling the appropriate methods.

If we stamp a pdf file, we strip all the metadata, and only put the metadata provided via setInfoDictionary() method (if any)

I provide the pull request so we can switch to use OpenPDF once included.

It's open for discussion, maybe it would be good idea to keep "producer" , "creationDate" and "modifiedDate" and only remove those fields on demand... but here is what we are using now

…ents

albfernandez · 2019-04-09T19:41:11Z

upssss, Unused imports... It's fixed now

andreasrosdal · 2019-04-09T20:30:48Z

Thanks for this contribution to OpenPDF. I will create a new release tomorrow.

andreasrosdal · 2019-04-09T20:33:33Z

For science, what are you using iText (and soon OpenPDF) for?

albfernandez · 2019-04-09T21:06:12Z

many uses ....

Create pdfs with Jasperreport
Create pdfs from java code.
Overlay text over existing documents (confidential, invalid, draft...)
Remove metadata in documents uploaded by users before publishing. (with this patch)
Generate Barcodes.
Joining multiple pdfs and images into one PDF.
Populate Acrofiedls from xml data to generate new PDF

andreasrosdal · 2019-04-10T06:28:29Z

OpenPDF 1.2.16 has been released: https://github.com/LibrePDF/OpenPDF/releases/tag/1.2.16

albfernandez · 2019-04-11T19:29:23Z

Thank you very much.
We've started some testing today.

andreasrosdal · 2019-04-24T13:20:56Z

@albfernandez How did the testing go?

albfernandez · 2019-04-24T21:14:57Z

We've done some changes to our code, but we're now stopped until May.

It would be a good idea to write a "upgrading from itext" section in the wiki, for example explaining with examples how to deal with removed classes as TiffImage

albfernandez/joinpdf@48ef2df
albfernandez/joinpdf@29d3a37

I'm not good in English, but I can provide info and examples about all changes we will be doing in our code

ngs-software · 2019-07-31T09:58:13Z

We have been experiencing issues when signing documents with keywords/metadata previously injected in a document, as these become inaccessible through API after document is stamped.

As far as we have seen, this is due to this pull request, as it make keywords hidden when stamping, removing references to existing kewords in the document (event if the actual keywords content is still present inside it - can be seen with a regular text editor).

We'd like to know what is the purpose of this modification, and if it responds to any compliance or standard rule we don't know about. From our point of view, metadata cleanup should be offered and some kind of utility helper/class to be used only when required as part of external business logic requirements, and not as a default behavior, as it may be required later for multiple purposes.

We include for your examination 3 pdf files:

Original File with metadata keywords added to it, that can be seen from Reader application.
File resulting from signing original file through a unit test with OpenPDF 1.2.15, where metadata is still visible from Reader application.
File resulting from signing original file through the same unit test with OpenPDF 1.2.16, where metadata disappears from Reader application (but can be seen in document through text editor).

From this analysis, we believe that this modification should be rolled-back and 'refactored' to a helper method that, by the way, should perform a complete cleanup of keywords, in case that's the actual requirement, as in current approach, apart of deleting existing information of original document without request from outer application, it only makes it hidden (but still present in pdf's raw contents).

Looking forward to your comments and considerations,

Best Regards,
ngs

signed-with-1.2.15.pdf
signed-with-1.2.16.pdf
original.pdf

albfernandez · 2019-08-03T20:56:27Z

There is a bug in metadata cleanup as you note, the data it's already on the file (hidden, but accessible via text editor).

The purpose of this patch is to generate files which doesn't leak sensible information (user names, host names, etc) in the metadata, so i think it's safe to clean this metadata by default, and add it if you need (for indexing as is in your case).

Generate clean metadata by default when creating (and stamping) docum…

1e46a89

…ents

This comment has been minimized.

Sign in to view

andreasrosdal added the Needs work label Apr 9, 2019

Remove unused imports

b064a09

andreasrosdal merged commit e7b8fc1 into LibrePDF:master Apr 9, 2019

andreasrosdal removed the Needs work label Apr 9, 2019

This was referenced Aug 3, 2019

Clean metadata patch only hide metadata instead of removing #215

Closed

Should metadata be cleaned or maintained by default in pdfstamper? #216

Closed

albfernandez mentioned this pull request Aug 12, 2019

Doesn't clean metadata by default in Stamper. #230

Merged

albfernandez mentioned this pull request Jan 16, 2020

Producer program + version number metadata is not populated in PDF's #327

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate clean metadata by default when creating (and stamping) documents #179

Generate clean metadata by default when creating (and stamping) documents #179

albfernandez commented Apr 9, 2019

This comment has been minimized.

albfernandez commented Apr 9, 2019

andreasrosdal commented Apr 9, 2019

andreasrosdal commented Apr 9, 2019

albfernandez commented Apr 9, 2019

andreasrosdal commented Apr 10, 2019

albfernandez commented Apr 11, 2019

andreasrosdal commented Apr 24, 2019

albfernandez commented Apr 24, 2019

ngs-software commented Jul 31, 2019

albfernandez commented Aug 3, 2019

Generate clean metadata by default when creating (and stamping) documents #179

Generate clean metadata by default when creating (and stamping) documents #179

Conversation

albfernandez commented Apr 9, 2019

This comment has been minimized.

albfernandez commented Apr 9, 2019

andreasrosdal commented Apr 9, 2019

andreasrosdal commented Apr 9, 2019

albfernandez commented Apr 9, 2019

andreasrosdal commented Apr 10, 2019

albfernandez commented Apr 11, 2019

andreasrosdal commented Apr 24, 2019

albfernandez commented Apr 24, 2019

ngs-software commented Jul 31, 2019

albfernandez commented Aug 3, 2019