-
Notifications
You must be signed in to change notification settings - Fork 49
Add ability to NOT write timestamps for zip file entries #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What timestamp should be written then (instead of)? |
I thought modificationTime/Date was be an optional field in the ZIP file format, but https://www.iana.org/assignments/media-types/application/zip says its not. So my initial idea would be to set the field to 0/epoch. This is also what reproducible-builds-maven-plugin is doing. |
The timestamp of the class file shouldn't change if the class has not be recompiled, should it? How do you define a reproducible build in this case? |
I mean that |
I think the Plexus archiver should allow its caller to explicitly set a single, fixed timestamp which will be used for all entries. If the timestamp is not set, then the Plexus archiver should default to its current behavior: using the modification timestamps of the entries to be archive. If this single timestamp is then exposed as |
Would it make more sense to actually adjust the timestamp of the .class file to match the source file ? (I'm thinking about an additional plugin/step that ensures this behaviour) |
@sewe sounds legit. Ill report back with some code :) |
@krosenvold Point is - the modification time of the source file says absolutely nothing about its content. This would just shift the problem from "when have I build something" to "when have i touched something" - it still would not be about "what was built" (independent from time). |
That would still not always help in creating reproducible builds, as it assumes that the modification timestamp of a freshly checked out source file stays the same across checkouts, which I don't think holds for every SCM out there. |
Not for git at least. There are only commit-timestamps |
Great! Thank you. |
Yeah, I suppose it makes sense to use something like the commit timestamp of the head commit. |
Well even then, if I have different commits that change stuff that does not impact the compiler input (.java file) -> hence does not influence the compiler output (.class files), I'd still get different artifacts. I don't see any use of the modified timestamp of files within a jar (even the weird JSP template case does only make AFAIK sense If modify an exploded template in the war). |
As long as the change just makes it possible to set a fixed file date on the archiver level I think this is a good solution. Someone else can decide what to set :) |
…erride zipEntry times globally. closes codehaus-plexus#48
Hi, |
while working on MNG-6276, perhaps supporting |
I am not really fond of using env variables because this is not the Java way of doing this. |
sure, the env variable should not be used at library level |
complementary idea: if this
|
IMHO, Where exactly this timestamp comes for (or whether it should even be the same across all archive entries) should be left for higher levels to decide. I hence dislike having a single, global I agree with @hboutemy that “the env variable should not be used at library level”, but I would expect to be able to say something like this in my POM:
… possibly along with |
@sewe the format is pregiven by the archive spec, not us. |
@michael-o I know. But I can envision other valid sources of timestamps besides I hope that explains my reasoning behind the hypothetical |
On a side note, be careful that the timestamp effectively written in the ZIP file should NOT depend on the user's time zone. |
This is true for many languages/environments/build tools that have to deal with reproducible build issues. Using SOURCE_DATE_EPOCH may seem ugly, but an environment variable is the lowest common denominator between various ecosystems. If every tool/library comes with its own way of setting the date it quickly becomes a headache to integrate heterogeneous tools together to achieve reproducible builds. |
Making a bit-by-bit reproducible build requires more than just making sure the timestamps are the same. For example the jar files are created in parallel threads so there is no grantee for the order of the entries. Of course we can make sure the timestamps are the same, add option to crate the jar files in single thread with predictable entry order and so on but the list may even grow bigger with the time and make Plexus Archiver hard to maintain. I think it would be better if the archives are created and after that the entries are sorted and any information that may differ between builds (such as timestamps) to be stripped or replaced with given values. For example there is a plugin that does that - https://zlika.github.io/reproducible-build-maven-plugin/. What do you think? |
@plamentotev Post-processing the artifact, like the |
@sewe I do agree but my concern is whether it is feasible to achieve that in Plexus Archiver. There are a lot of moving parts - Plexus Archiver allows a lot of customization and it relies on dependencies that does not guarantee determinism. I'm not saying it is impossible to active but is quite tricky. I would prefer to see a discussion about the whole picture and how the goal can be achieved. If were doing this piece by piece I'm afraid that we'll end up in dead end or create a solution that is not optimal. If there is added value to have the timestamp fixed for Zip entries besides reproducible builds then we should do it anyway. Otherwise I think we should first grantee the order of the inputs and the outputs - IMHO that would be the hardest part to achieve without additional I/O. |
Sorry. Looks like I've missed that there is already a discussion (issue and wiki page) to track the reproducible/verifiable builds[1][2]. [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74682318 |
just done apache/commons-compress#78, waiting for merge |
Great work @hboutemy! But not sure if it is enough. This will grantee that the entries in the ZIP file are stored in the same way they are added. But what if they are added in different order every time? Do we have a guarantee that the resource collections ( |
As far as I know, the traversing order of the filesystem is not reproducible between two computers (or between two copies on the same computer). Anyway, Hervé's work is an important step towards reproducibility. |
@plamentotev @Zlika |
latest news: see #121 |
closing this issue as it is superceded by #121 |
To be discussed:
Thinking in reproducible builds, It would be great if we could see build tools that use plexus-archiver as pure transformers/functions - given the same input - will always produce the same output (at least if they are used synchronous and always provide the same input in the same sequence). This is however currently not possible using maven-archiver (transitively using plexus-archiver) because the AbstractZipArchiver will always create ZipEntry timestamps. I wish there was a property to
AbstractZipArchiver
likeprivate boolean createZipEntryTime = true;
to turn the setTime behaviour off.I'd be happy to submit a PR if this feature is desirable (is there maybe something in the jar spec that insists on these fields? I have not found any reference).
The text was updated successfully, but these errors were encountered: