-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for normalizing the archived files metadata #47
Add support for normalizing the archived files metadata #47
Conversation
func normalizeCompressingFile(fh *zip.FileHeader) { | ||
fh.Method = zip.Store | ||
fh.SetModTime(time.Date(1981, 4, 10, 0, 0, 0, 0, time.UTC)) | ||
fh.SetMode(0644) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the file is executable?
See here for an example: https://github.com/terraform-providers/terraform-provider-archive/pull/41/files#diff-572809c26c1b9c23a665ebaea5dd7ea8R70
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with executing binaries inside a Lambda functions; based on a brief research, maybe Lambda resets the permissions of the archived files - see, for example:
- https://www.reddit.com/r/aws/comments/7k0txe/how_can_i_give_executables_inside_a_lambda
- https://stackoverflow.com/questions/48194886/update-permissions-on-aws-lambda-file-to-make-it-executable
It's not 100% clear from those resources where their problem is - it could be the zip library not storing the file permissions (not all do), or Lambda resetting the permissions when unarchiving; if it's the latter case, managing executable bit would not yield any effect.
Do you have direct experience on this use case, so that I know with certainty what's the exact behavior?
When archiving files from different machines, even if the source content is the same, different archive files may be generated due to a variety of factors, like compression libraries or file metadata difference (permissions, etc.). This creates problems in situations where it's expected that the exact same archive should always be generated from a given source content (typically, a Lambda function resource). An example context that causes differences is Git on different O/Ss - filesystems may have different mount permission masks, and be the same for Git (which doesn't distinguish 0644 from 0664), but end up generating a different archive. The `normalize_files_metadata` option makes the archiver produce a "normalized" archive, which solves the problem. In the Zip case, this is obtained by setting: - the compression method to Store - the modification date to a fixed one - the permissions to 0644 Disabling compression is a tradeoff, however, in the context where this functionality is used (eg. Lambda function resources), the difference is negligible.
0bfacd0
to
347a1af
Compare
Removed the Travis |
Did this fizzle out? I'm eagerly awaiting an update on this or #41. I'm using a pretty ugly work-around for now, to unify the umask across platforms: git () {
origin="$(command git remote get-url --push origin 2>/dev/null)"
[ "$origin" = "git@github.com:foo/terraform.git" ] &&
(umask 0022; command git "$@") ||
command git "$@"
} |
Hello! Sadly, the upstream maintainer(s) entirely ignored both the issue and the PR - the PR is long standing, and hasn't been commented, closed or accepted, while there's activity on master. We've been using this archive fork at my company successfully for a long time by now, so likely, the best strategy is to officially release the fork. I'll close this PR, then give updates on the fork. |
Closing for inactivity. |
It's been an issue for a while for us using 3 different OSs in the office. Any chance we can get some visibility back on the issue? |
I think the project is essentially unmaintained (and when it wasn't, this PR was entirely ignored, which is "not cool" 🙄). Regardless, something important to keep in mind for people needing this functionality is that there is no need for the code to be upstream. Just clone my fork and compile the plugin, then place it in the appropriate location - either in the per-user (see manual) or in the per-project TF configuration directory. The fork is stable, at least, for our use case - in my company, we've been using it for a long time, and we just keep the compiled binary into our project's TF configuration directory. |
@saveriomiroddi cant you leave the pull-request open so it eventually gets merged or receive some comments from the maintainers? Which fork do you mean? Yours https://github.com/saveriomiroddi/terraform-provider-archive-dev seems to be "outdated" as well? EDIT: You probably referr to https://github.com/saveriomiroddi/terraform-provider-archive-dev/tree/sav-normalize_files_metadata_option |
Hello! So! I have a few considerations in relation to the PR closure. First, most important from the practical perspective: the changes applied by the PR are free 😄 This means everybody is 100% free to, say, fork the original repository, copy the commit and open a new PR. In this case the procedure is: # fork the original repository from the Web UI (or the GitHub commandline tool)
git clone <forked_repository_address>
cd <local_repository_dir>
git remote add saverio https://github.com/saveriomiroddi/terraform-provider-archive-dev.git
git fetch --all
git checkout -b <feature_branch_name> saverio/sav-normalize_files_metadata_option
git push --set-upstream origin HEAD Now you can create the PR from the GitHub Web UI (or cmline tool) 😄 Second: repository maintainers can re-open PRs; this happened to a contribution of mine on Bundler. It's crucial to understand that the important thing is that the issue remains open - the typical maintainance workflow uses issues for tracking purposes, and in this case, an issue is existing and open. Now, regarding the ideological aspect. I close PRs after a set time because I think it's disrespectful from maintainers to wholesale ignore conversations. I'm perfectly fine if an issue/PR is closed by the maintainers because:
But I find ignoring contributions (conversations and/or PRs) very poor maintenance practice, unless the project is expressly unmaintained. Most important of all, this maintenance attitude implies that contributors must be thankful to the maintainers if their contributions are accepted. Having said that, back to practice: Terraform plugins are trivial to build and use. In this case, it's just a matter of cloning the fork, building the plugin, and placing it in the appropriate directory. At my company, we keep those binaries in the SCM, so using them is friction-free. No need to chase careless maintainers 😄. |
I'm not sure what you mean with this; we've been using a compiled version of that branch/PR since it was published, until now - and now we're using Terraform v0.12.19 without any problems, so I think this should work for everybody. |
Sure, but I think its not a nice gesture to copy the commit and open a new PR.
I see your point but in my opinion PR can stay open as long as its relevant and up to date to get merged. But I know now, why you closed, that's ok for me as an explanation :)
You're a bit harsh with that opinion, maintainers life is hard I think.
But its better to keep changes upstream and forks tend to outdate very fast. But I'm aware of that its trivial to build and use it.
In my opinion maintainers are also contributors and should follow these principles. But maybe you are right and there needs to be an extended version aiming for maintainers 😄
I just wasn't sure which fork/branch you were referring, but I figured it out. So everything fine :) 👍 Thanks for your detailed answer. I got your overall opinion and understand it. |
Sorry if that sounded patronizing - part of my job is to write scripts, so it's kind of a reflex 😬 . Regarding the discussion as a whole, somebody, on the referenced issue, pinged the maintaners as you were suggesting, which I think is the best idea - hopefully they'll at least participate to the discussion 🙂 To clarify, I'm a maintaner as well (of other projects) 😬, so I understand the problems 🙂 |
Hello @saveriomiroddi. Our sincerest apologies for not engaging sooner on the longstanding #34 and for not addressing the proposed fix here, we sincerely appreciate your contributions in the Terraform Ecosystem. With regards to this PR, after I reviewed it, I do think it would have been hard to approve a generic "normalization" formula. What this PR does reveal though are the areas of an archive that can create differences in the computed hash output. So I think we will look to open up things like file mode, compression algorithm, and modification date as configurable options to help address problems such as #34. |
Hello! Thanks for the attention. It makes sense to make the parameters granular and available. Ultimately, achieving the objective can be "not-complex" (of course, with the assumption the our tested use cases cover a large portion of, or hopefully all, the potential differences between archives), since the parameters involved are only three (compression, modification time and permissions). |
// - fixed file permissions. | ||
// | ||
func normalizeCompressingFile(fh *zip.FileHeader) { | ||
fh.Method = zip.Store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes a very sneaky compatibility issue with Java's standard java.util.zip, resulting in:
java.util.zip.ZipException: only DEFLATED entries can have EXT descriptor
at java.base/java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:311)
at java.base/java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:123)
at com.meh.test.UnzipFiles.unzip(UnzipFiles.java:30)
at com.meh.test.UnzipFiles.main(UnzipFiles.java:17)
The quick summary seems to be: Go's zip uses streaming, meaning it first writes the file entry body, and then writes a so-called "ext" data descriptor with size information. Then, java.util.zip can extract these if they use DEFLATE, since DEFLATE encodes data in such a way that you can know when you have all the data. STORE, on the other hand, has no such encoded info, and Java fails to handle it.
Details:
- https://bugs.openjdk.java.net/browse/JDK-8143613
- https://github.com/30x/zipper
- https://groups.google.com/g/golang-nuts/c/0iae5Ng-I-0
- https://stackoverflow.com/questions/47208272/android-zipinputstream-only-deflated-entries-can-have-ext-descriptor
Impact case: The Amazon ECS (Blue/Green) deploy provider of AWS CodePipeline raises an exception if a normalized zip file is fed to it:
Exception while trying to read the task definition artifact file from: <source stage>
We use archive_file to produce the zip, containing appspec.yaml and taskdef.json files for CodeDeploy and ECS, respectively.
AWS services seem to be heavily Java based, so while we don't see the full Java exception output, it seems likely that this is the problem.
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. |
UPDATE: a few important notes if anybody happens to read this Pr:
When archiving files from different machines, even if the source content is the same, different archive files may be generated due to a variety of factors, like compression libraries or file metadata difference (permissions, etc.).
This creates problems in situations where it's expected that the exact same archive should always be generated from a given source content (typically, a Lambda function resource).
An example context that causes differences is Git on different O/Ss - filesystems may have different mount permission masks, and be the same for Git (which doesn't distinguish 0644 from 0664), but end up generating a different archive.
The
normalize_files_metadata
option makes the archiver produce a "normalized" archive, which solves the problem. In the Zip case, this is obtained by setting:Disabling compression is a tradeoff, however, in the context where this functionality is used (eg. Lambda function resources), the difference is negligible.
Close #34.
The functionality is easy and fits nicely with the code. The code itself doesn't have a very string case convention for variable names (some are entirely lower case without underscore, some in camel case, etc.), so that may be subject to change.
The name of the option is entirely open to change. "Normalization" is the concept that I think is fit, but of course, there are different ways of viewing it.