Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: TarFileWriter tool optionally preserves input file mtimes #149

Closed
dannysullivan opened this issue Apr 7, 2020 · 8 comments · Fixed by #155
Closed

Feature: TarFileWriter tool optionally preserves input file mtimes #149

dannysullivan opened this issue Apr 7, 2020 · 8 comments · Fixed by #155
Assignees
Labels
P2 An issue that should be worked on when time is available

Comments

@dannysullivan
Copy link
Collaborator

See bazelbuild/bazel#11028 for initial conversation.

I work on a team that maintains deployment tools that use https://github.com/bazelbuild/rules_docker/ to build docker images for applications. We noticed that static files in deployed applications always had a Last-Modified value of the unix epoch. We traced the issue back to the TarFileWriter class, which copies all input files into an output tar and preserves most file metadata but overrides the mtime to a provided value (defaulting to 0).

I understand why, in many cases, users would want predictable mtimes for build reproducibility. I'm not sure I understand why this tool should always override mtimes, instead of making that an opt-in or opt-out feature. In our use case, we're passing an entire tar file through TarFileWriter as an input - I think it would be reasonable to allow the output metadata to match the input metadata.

@dannysullivan
Copy link
Collaborator Author

@aiuto any updated thoughts on this? When we discussed on my PR in the main bazel repo you mentioned that you disagree with the current default behavior (which opens up tar files, modifies their mtimes, and then re-tars them).

@aiuto
Copy link
Collaborator

aiuto commented Apr 14, 2020

Sorry I've been out of the loop so long. With everyone working from home, priorities got shuffled a bit.

I have a different proposal (which I think your other CL mostly hits)

  • Then adding a tar (src) to a pkg_tar (dest), all the mtimes from pkt_tar are retained. No option needed.
  • If we want the mtimes for dest to be current, we make that flip based on the --stamp runtime flag.

Why this works

  • if you have a static .tar, it will have the mtimes you want, and they always get preserved.
  • if you build the tar to feed into dest, the mtimes will be a standard epoch value, so all good too
  • if you use --stamp, all the non-static tars have the new date, but the static ones don't, which is most likely what you want in a distribution.

How does that sound.

@aiuto aiuto added the P2 An issue that should be worked on when time is available label Apr 14, 2020
@dannysullivan
Copy link
Collaborator Author

That sounds good to me - the only part I'm confused about is the use of --stamp to set current mtimes within tar files. Is that a use case that currently exists, or is that a separate feature request? For my purposes we just need to be able to preserve mtimes from the input tar file.

@aiuto
Copy link
Collaborator

aiuto commented Apr 16, 2020

AFAICT, it is not the case - you always get the epoch time. But that is a separate feature to build.

The first change is simply to preserve mtimes on tars that are untarred and retarred into a tar. That should be a no-op except when you have a .tar built outside Bazel. In which case we want to preserve whatever that build system did. So, on to the PR.

@kvet
Copy link

kvet commented Jun 25, 2020

Hi. I've tried the 0.26 version and still have mtime set to EPOCH. What am I doing wrong?

pkg_tar(
    name = "react_static_tar",
    srcs = [":react_static"],
    owner = "101.101",
    package_dir = "/var/www/html",
    strip_prefix = "react_static",
    tags = ["manual"],
)

I checked that src files have the correct mtime but in after unpacking from tar they are wrong

@aiuto
Copy link
Collaborator

aiuto commented Jun 25, 2020

I don't think any settings allow you to use the mtimes from the file system.
This feature was to preserve the times of files in included tar archives.

@kvet
Copy link

kvet commented Jun 29, 2020

@aiuto Am I right that cleaning mtime is a normal behaviour? Right now I am not able to set proper caching strategy for my nginx container as it relies on mtime to calculate last modified and etag headers.

@aiuto
Copy link
Collaborator

aiuto commented Jun 29, 2020

Yes. That is as intended. It makes builds more hermetic and repeatable. Using mod time from the file system is fraught with reproducibility problems.

Now, while I am not going to change the default behavior, there is still a need for a well thought out way to handle some common use cases

  • make every file in the package have the current time stamp. Since this would make the target volatile, it should probably be controlled by --stamp
  • allow you to designate some files have modification times from the file system
  • allow you to use the modification times from the source management system.

The last one is tricky, but probably more useful than the strict modification time from disk. If i am flipping git branches a lot I don't want files to be saved at the time of last git checkout - I want them at last logical modification time, which might be minutes (or days) before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 An issue that should be worked on when time is available
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants