Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging rules should support --stamp flag #287

Open
nacl opened this issue Feb 3, 2021 · 17 comments
Open

Packaging rules should support --stamp flag #287

nacl opened this issue Feb 3, 2021 · 17 comments
Assignees
Labels
P2 An issue that should be worked on when time is available
Milestone

Comments

@nacl
Copy link
Collaborator

nacl commented Feb 3, 2021

#285 requested a feature to preserve mtime's in tar files. @aiuto brought up the below WRT how this might be assembled:

Originally posted by @aiuto in #285 (comment)

Let's discuss solutions in an issue first. I would like to to have as many people as possible weigh in on what is useful.
There are a lot of questions

  • do we change the stamp of every file?
  • does that include files which came from pre-made archives that have a particular timestamp of their own?
  • some people want to put the timestamp into the output filename. While I find that dubious, any stamping solution should consider it.
  • Why is stamp_mtime an attribute? Why not just support the --stamp flag. That seems to be the API of least surprise.
  • and.... no solutions that just to .tar. We need do do the appropriate thing for each of pkg_zip, pkg_deb and pkg_rpm. But that makes getting the API right before doing any code even more important.

This issue tracks the design of a "stamping" mechanism for packages prior to implementation, especially with regards to fixing/clamping file modifications. Whether the --stamp argument should control this is also a question to consider.

@nacl
Copy link
Collaborator Author

nacl commented Feb 3, 2021

AFAIK, my particular use case does not have a preference WRT preserving mtimes, but some things that immediately come to mind are:

  • Some rules support SOURCE_DATE_EPOCH (e.g. pkg_rpm), others do not. It also applies to more than mtimes, so that may be out of scope here.

  • --stamp feels like the tool of choice here, but BUILD_TIMESTAMP may also not be the right value to use. Bazel provides it for free.

@aiuto
Copy link
Collaborator

aiuto commented Feb 4, 2021

The least surprising behavior is that all the top level pkg rules implement stamp as done in cc_binary. I consider that a minimal requirement.

stamp

  • Integer; optional; default is -1
  • Enable link stamping. Whether to encode build information into the binary. Possible values:
    • stamp = 1: Stamp the build information into the binary. Stamped binaries are only rebuilt when their dependencies change. Use this if there are tests that depend on the build information.
    • stamp = 0: Always replace build information by constant values. This gives good build result caching.
    • stamp = -1: Embedding of build information is controlled by the --[no]stamp flag.

#288 demonstrates how the default behavior could be done. That is, I can get the build time for stamping if --stamp is set on the command line. I didn't do the 0 and 1 cases, nor implement for every rule.

An orthogonal feature to that is if we change the time stamps of files expanded out of archives (if I am reading #265 right). That is it's own debate. The request I have seen in the past is along the lines of

  • I have a bundle of static files that will be served by a web service
  • they have a set of creation dates.
  • I bundle them into a zip file and check into my source tree
  • I want to include them in a .deb package, but individually not as one zip
  • I want the original time to be preserved.

I think we need provision for that kind of behavior. I don't have a design in mind right now. I don't think it is a blocker for --stamp support, as long as we design with the idea in mind that individual files might need to say they are immune to stamping
@nacl is rpm's use of SOURCE_DATE_EPOCH similar to this?

Then there is the question of using the time stamp in the archive name This is difficult for many reasons and we should not block basic stamping on that.

@dennis3620
Copy link

Thanks for following this up and your comments.

My main motivation is to set a more dynamically mtime for the input files. At the moment either the default behavior with portable or a static mtime can be used, but I'm looking for a "self-adjusting" mtime based on the input files. The idea was to use a new volatile status variable which is for example set by a bash script based on the input files' mtime or git command to get a less changing variable than the BUILD_TIMESTAMP.

stamp = 1: Stamp the build information into the binary. Stamped binaries are only rebuilt when their dependencies change. Use this if there are tests that depend on the build information.

If I understand this correctly with "stamp = 1" the rebuild is only triggered if the input of the specific rule change, i.e. the BUILD_TIMESTAMP is relatively close to the recent mtime of the input files. The only drawbacks are fresh builds, e.g. if no cache is available. #285 had an other intention, but if the stamp behavior is closer to the build-in rules and less complex
I guess this would even fit my needs somehow.

I will have a closer look into #288.

@aiuto
Copy link
Collaborator

aiuto commented Feb 5, 2021

Maybe we should start from needs for stamping, and then work back to features to achieve that.

  1. We certainly need the default to be a repeatable constant value. This gives us easily testabl repeatability. That should be the default behavior for all rules.
  2. On the day we want to cut a release, we probably want the time of build to be used in the binary. Here I think the default Bazel stamp behavior works well. Builds use a constant mtime by default, so during development and testing you don't overbuild. When you use --stamp, you want the package to use time of build
  3. Some targets are "extra-volatile". This is the stamp=1 case. Maybe we have a process where we want to do something like bazel run :push, where push depends on a .deb pacakge we built. Should multiple runs in a row rebuild the .deb each time, even if no other sources changed? I think the current Bazel idea is right here too. Do not rebuild the binary package, but do run the action which depends on it.
  4. Is there a need for users to specify a build time which is neither the epoch constant or the current time? That is the case I think you are trying to provide for.

Let's say there is a user for case 4. A hypothetical is that I am cutting a product release on Feb. 4, and I want the stamps to be 2020-02-04 33:22:11. Maybe I used the date as is, but encode something in the hh:mm:ss. We obviously don't want to have to set that in each rule, so the value has to come from the command line in some way. We would either pass the value (which requires a blaze change) or use workspace status commant (which does not).

Question 1: Can workspace status just replace BUILD_TIMESTAMP? If so, then that seems to be the way to do this.
Question 2: Is there a need for workspace status to emit 2 or more distinct time values and have different rules pick which of those they want? I strongly doubt that.
Question 3: How ofetn do we need anything beyond stamping with the current build time?

@aiuto aiuto added the P2 An issue that should be worked on when time is available label May 3, 2021
@aiuto aiuto self-assigned this May 6, 2021
@aiuto aiuto changed the title Packaging rules should support stamping Packaging rules should support --stamp flag May 6, 2021
@aiuto
Copy link
Collaborator

aiuto commented May 6, 2021

Status update: #288 is ready to go

  • it implements the --stamp flag and stamp attribute behavior of Bazel's *_binary rules
  • It gets the time stamp from the BUILD_TIMESTAMP from the volatile file.

If we need to stamp with arbitrary times other than build time, the technique could

  • be expanded to allow a timestamp to come from the command line.
  • be expanded to allow a timestamp to come any user defined file.

aiuto added a commit that referenced this issue May 6, 2021
…1) (#288)

Add stamp attribute to pkg_tar (in the style of cc_binary(stamp=-1,0,1)

Part 1 of #287 This is not pretty, because Bazel does not make --stamp available directly

- add a config setting to mimic --stamp: private_stamp_detect
  - This is a bit of a hack, until the value of stamp is available directly from starlark ctx See: bazelbuild/bazel#11164
  - use the config setting to pass a bool down to the packaging rule to see that --stamp was set
- add the stamp attribute to pkg_tar
- use combination of stamp & private_stamp_detect to invoke stamping
- this uses an undocumented feature to get to bazel-out/volatile-status.txt
  - that is bad in theory
  - In practice Bazel will eventually have admit the cat is out of the bag and document the files.

Future work:
- implement for all 4 package types.
- contain a test where a stamped package is used from a workspace that includes rules_pkg.

RELNOTES: Add stamp to pkg_tar.
@aiuto aiuto added this to the 1.0 milestone Jun 9, 2021
aiuto added a commit to aiuto/rules_pkg that referenced this issue Jun 10, 2021
aiuto added a commit to aiuto/rules_pkg that referenced this issue Jun 10, 2021
aiuto added a commit to aiuto/rules_pkg that referenced this issue Jun 10, 2021
aiuto added a commit that referenced this issue Jun 22, 2021
* Add --stamp to zip.

Advances #287   Now both pkg_tar and pkg_zip support --stamp behavior in the same manor as *_binary rules.
@tkinz27
Copy link

tkinz27 commented Jun 20, 2022

So most of the comments here are about using stamping to determine mtimes for the files within a package. What about support though to use values from the workspace status command as part of variable inputs to the packaging rules?

I'm mostly looking for a simple way to generate this version string for our debian package

git tag "v0.0.$(git show -s --format=%ct HEAD)-$(git rev-parse --short HEAD)"

Pretty easy to output this as part of the workspace status command. The challenge I'm having is getting that into a place the pkg_deb rule can use it. I'm fairly new to bazel but it seems like each rule has to implement support for reading the ctx.version_file and ctx.info_file directly.

I am currently using a custom rule to generate a version file and use that. While the version correctly gets injected into the control file of the debian package, it is not used as part of the package name itself. I'm not sure if this is just a bug or a specific choice.

@aiuto
Copy link
Collaborator

aiuto commented Jun 20, 2022

It's a specific choice, but more on Bazel itself rather than rules_pkg. We need to predict file names during analysis, but we don't let you read the info file containing the variable until execution phase. To have this work in the easiest to understand way, we would need Bazel to be able to have the equivalent of a workspace status command that created variables that got put directly into a configuration fragment. This would be useful for a handful of people (doing packaging) and incredibly dangerous foot-gun where you could ruin performance by having everything rebuild all the time.

Take a look at the examples in: https://github.com/bazelbuild/rules_pkg/tree/main/examples/naming_package_files
Don't just read the doc, look into the BUILD file and see some of the things there. Some things you might try:

  1. Don't use workspace status command. Instead, write a wrapper script around Bazel that does the work you want, then calls command with a command line flag to pass in the git version.
  2. My hunch is that you could turn the git-revision into an self configuring toolchain. The hash of the head would become a toolchain variable, and then you could pull it into your package name.

@tkinz27
Copy link

tkinz27 commented Jun 21, 2022

Thanks for the feedback. Setting up a toolchain seems like an interesting idea. I did look at the naming_package_files but couldn't figure out how to generate the PackageVariablesInfo provider since rules cannot read files or (at least from what I can see) get the stdout of a command they run. From that example I didn't think to try to create a custom toolchain just to get the git revision, I will have to try that next.

One thing that was confusing was the version_file attr in pkg_deb not being used in the package name but is embedded in the debian control file. I'm not sure from your answer if you are saying that is also a specific choice or if you were just talking about not being able to use the variables from workspace status. I currently do have a custom genrule that generates that version file its just was really surprising when it was not used as part of the package name. I actually thought my genrule was broken and not being picked up but only after looking at the source for pkg_deb did I realize only the version attr was used in the package name.

Lastly, it seems like other bazel rules do seem to support utilizing the workspace status variables IFF the --stamp option is provided. For instance here is support in rules_python, rules_go, rules_nodejs, rules_docker. Any chance rules_pkg would be interested in similar support?

Again appreciate the feedback and explanation. It helps me understand bazel a little bit more.

@aiuto
Copy link
Collaborator

aiuto commented Jun 22, 2022

The fundamental bazel behavior is that no file reads other than ..bzl and BUILD files can impact the analysis phase, and that all action input and output files are defined during the analysis phase. Thus it is fundamentally impossible to read a version file and have it change the name of an output file.

Lastly, it seems like other bazel rules do seem to support utilizing the workspace status variables IFF the --stamp option is provided.

I think they hit the same wall. You can't use the variables from the workspace file in an output file name.

@adam-azarchs
Copy link
Contributor

You can't use the variables from the workspace file in an output file name.

This is generally not a blocker. It's not hard to have a CI system do something like

bazel build //:my_output.tar //:my_dest.txt
cp bazel-bin/my_output.tar "${ARCHIVE_DIRECTORY}/$(<bazel-bin/my_dest.txt)"

where the build for my_dest.txt does take the stamp information into account.

One case that is problematic for this is if you want the name of a file within an archive to depend on stamp information. We have the option for a file providing a prefix for every file in the archive, but it would be useful to have the option to e.g. provide a json file (which could be built by a thing that uses stamp information) with a set of renames, rather than the current situation which is that renames need to all be specified in the BUILD file.

@aiuto
Copy link
Collaborator

aiuto commented Oct 25, 2023

You can make the renames in a .bzl file that you generate before the bazel build, then load that from BUILD files where you want to rename things.

@adam-azarchs
Copy link
Contributor

adam-azarchs commented Oct 26, 2023

Are you suggesting running a bazel build that modifies the source tree to generate a bzl file using stamping information, and then running a second build that uses that? That sounds error prone and difficult to enforce that the generator step is in sync with what the rest of the build is seeing (even worse if the bzl file was being generated by a tool that wasn't part of the bazel build configuration) as well as being slow and inconvenient compared to a solution that only requires a single bazel invocation. It also is problematic because we're talking about a map between bazel File targets and the names we want those files to have in the archive. We don't necessarily have an easy way to know the list of files that need to be renamed without first doing a build, or else we end up with a maintenance headache of needing to collect the list of transitively included files in more than one place.

Fixing this more cleanly wouldn't be terribly difficult; just allow a user to supply an executable target which can modify the manifest file before it's handed to the tarball packaging tool.

@aiuto
Copy link
Collaborator

aiuto commented Oct 26, 2023

No. I meant that a --workspace_status_command script would generate status data and a .bzl file (in your source tree) that can be loaded by the BUILD file. Then you turn that into PackageVariables which can be injected as needed.

That does not completely help you, but I can easily imagine extending pkg_files so you can take inputs and regex rename them to new things, using the variable read from the .bzl file.

@adam-azarchs
Copy link
Contributor

adam-azarchs commented Oct 30, 2023

That sounds rather dangerous to me, as I'm not aware of any documented contract that says that the workspace status command will always be evaluated before the loading phase, or in fact that it would be evaluated at all if no other build actions requested its output.

@aiuto
Copy link
Collaborator

aiuto commented Oct 31, 2023

Sure. But what other choices do you have? Bazel doesn't make the time stamp accessible directly to Starlark at analysis time, so I can't get it into the file name. The same for the git hash, or workspace name, or any variety of transient values.

@adam-azarchs
Copy link
Contributor

Your statement of the problem makes the solution clear - starlark does not have access to the required information, which means that the renames can't be done in starlark. But that's ok - starlark produces a manifest file that's consumed by the tarball builder action; there's no reason another build action couldn't be interposed between the starlark and the tarball builder, to either rewrite the starlark-generated manifest file or supply a separate renames file that the tarball builder would be able to merge with the one written out by the starlark. Unlike starlark, that action can depend on the workspace status files.

That is, something like

def _pkg_tar_impl(ctx):
    ...
    write_manifest(ctx, manifest_file, content_map)
    if ctx.executables.manifest_transformer:
        modified_manifest = ctx.actions.declare_file(ctx.label.name + ".mod.manifest")
        ctx.actions.run(
            outputs = [modifed_manifest],
            inputs = [manifest_file, ctx.version_file, ctx.info_file],
            executable = ctx.executables.manifest_transformer,
            arguments = [manifest_file.path, ctx.version_file.path, ctx.info_file.path, modified_manifest.path],
        )
        manifest_file = modifed_manifest
    ...

@aiuto
Copy link
Collaborator

aiuto commented Nov 3, 2023

OK. I see how what you are proposing could be possible. That would solve the case of changing file names to include time stamps or commit, or whatever you want to write into the info file with the workspace status command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 An issue that should be worked on when time is available
Projects
None yet
Development

No branches or pull requests

5 participants