Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a more concise package "manifest" specification #238

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nacl
Copy link
Collaborator

@nacl nacl commented Sep 25, 2020

In using pkg_filegroup in our codebases, we've found that it takes a
significant amount of vertical space for complex packages. For simple
mappings (e.g. copy a target/file to a destination), we've found that a simple
tabular format provides a succinct way to specify contents without sacrificing
much in the way of readability.

At this time, I'm looking to start a discussion on the idea presented here. The
idea itself is essentially agnostic to the implementation of pkg_filegroup,
but pkg_filegroup needs some changes in order to make the usage below more
idiomatic: there's currently no means to consolidate multiple differing
pkg_filegroup-like rules into a single rule. A potential idea for this is
discussed in #212.


This change provides a simple csv-like mechanism for specifying package
contents. Instead of specifying multiple pkg_filegroups, you can instead say
something like:

manifest = [
    ("/usr/bin/foo", "copy", "unix=0755", ":foo"),
    ("/etc/foo.d",   "mkdir" "unix=0755", ""),
    ...
]

manifest_rules = pkg_list_manifest(
    name = "manifest",
    manifest = manifest,
    default_attrs = "user=root;group=root",
)

pkg_rpm(
    name = "my-rpm",
    data = manifest_rules + [
        # Complex pkg_filegroup rules here
    ],
    ...
)

Which, for simple install-and-create operations, is enough and is much easier to
read.

See the in-file documentation in pkg/experimental/manifest/manifest.bzl for
more details on how the manifest is intended to be structured.

Tests were also provided, runnable as //experimental/manifest:manifest_test.
They are currently incomplete, but are enough to prove basic correctness so far.

The one thing that is not yet clear to us is how we should specify and order the
manifests. Some potential examples are provided in
pkg/experimental/manifest/examples, with further information described in
pkg/experimental/manifest/examples/README.md.

---

In using `pkg_filegroup` in our codebases, we've found that it takes a
significant amount of vertical space for complex packages.  For simple
mappings (e.g. copy a target/file to a destination), we've found that a simple
tabular format provides a succinct way to specify contents without sacrificing
much in the way of readability.

At this time, I'm looking to start a discussion on the idea presented here.  The
idea itself is essentially agnostic to the implementation of `pkg_filegroup`,
but `pkg_filegroup` needs some changes in order to make the usage below more
idiomatic: there's currently no means to consolidate multiple differing
`pkg_filegroup`-like rules into a single rule.  A potential idea for this is
discussed in bazelbuild#212.

---

This change provides a simple csv-like mechanism for specifying package
contents.  Instead of specifying multiple `pkg_filegroup`s, you can instead say
something like:

```python
manifest = [
    ("/usr/bin/foo", "copy", "unix=0755", ":foo"),
    ("/etc/foo.d",   "mkdir" "unix=0755", ""),
    ...
]

manifest_rules = pkg_list_manifest(
    name = "manifest",
    manifest = manifest,
    default_attrs = "user=root;group=root",
)

pkg_rpm(
    name = "my-rpm",
    data = manifest_rules + [
        # Complex pkg_filegroup rules here
    ],
    ...
)
```

Which, for simple install-and-create operations, is enough and is much easier to
read.

See the in-file documentation in `pkg/experimental/manifest/manifest.bzl` for
more details on how the manifest is intended to be structured.

Tests were also provided, runnable as `//experimental/manifest:manifest_test`.
They are currently incomplete, but are enough to prove basic correctness so far.

The one thing that is not yet clear to us is how we should specify and order the
manifests.  Some potential examples are provided in
`pkg/experimental/manifest/examples`, with further information described in
`pkg/experimental/manifest/examples/README.md`.
@nacl nacl added the WIP label Sep 25, 2020
@nacl nacl requested a review from aiuto September 25, 2020 16:27
@aiuto
Copy link
Collaborator

aiuto commented Sep 25, 2020

Interesting. I don't have time to give a lot of thought to it today.
It seems that we could probably gel quickly if we could specify what the pkg_filegroup providers were.
I think something like

PackageEntryInfo(target, owner, attributes, path_in_archive, action(copy,mkdir,symlink, mknod))
PackageFilegroupInfo(list(PackageEntryInfo), deps=depset(<all the targets in the the list of PackageEntryInfo).

Now can we see if that is sufficient to build complex pkg_filegroups AND easy manifests. I think so.

pkg_list_manifest could be a real rule returning those.

  • create a PackageEntryInfo for each row of the manfest
  • if they are a copy, add the target to the depset
  • return the aggregated PackageFileGroupInfo

Then to build pkg_filegroup the longer way we could to things like

pkg_filegroup(name=docs, srcs=glob(["docs/**"]), attrs="a+r", strip_dirs="docs", package_dir="usr/share/docs")

This would do almost exactly the same thing. Iterate over srcs, make an Entry for each and return the aggregate.

And, of course, we can do
pkg_filegroup(name=combined,
srcs=[":manifest_based", ":docs"]
package_dir = "my/fake/root",
)

To merge them and rewrite the path_in_archive for everything.

Perhaps we can do a VC brainstorm soon to come to agreement on the core providers, then we can fork experimentation with rules that create them styles alongside trying to implement in all the package types to see what we left out.

@nacl
Copy link
Collaborator Author

nacl commented Sep 29, 2020

Interesting. I don't have time to give a lot of thought to it today.
It seems that we could probably gel quickly if we could specify what the pkg_filegroup providers were.
I think something like

PackageEntryInfo(target, owner, attributes, path_in_archive, action(copy,mkdir,symlink, mknod))
PackageFilegroupInfo(list(PackageEntryInfo), deps=depset(<all the targets in the the list of PackageEntryInfo).

Now can we see if that is sufficient to build complex pkg_filegroups AND easy manifests. I think so.

Yeah, the providers were what was blocking this from becoming a "real" rule. I definitely wanted to get more feedback before trying to implement them.

pkg_list_manifest could be a real rule returning those.

* create a PackageEntryInfo for each row of the manfest

* if they are a copy, add the target to the depset

* return the aggregated PackageFileGroupInfo

This was approximately what I was imagining, but I'm not sure that we can do this without a macro at some point in the process. Based on the current available rule attribute types, the only one that looks even close to a list-of-lists is the string_list_dict, which doesn't lend itself to brevity super well.

Then to build pkg_filegroup the longer way we could to things like

pkg_filegroup(name=docs, srcs=glob(["docs/**"]), attrs="a+r", strip_dirs="docs", package_dir="usr/share/docs")

This would do almost exactly the same thing. Iterate over srcs, make an Entry for each and return the aggregate.

And, of course, we can do
pkg_filegroup(name=combined,
srcs=[":manifest_based", ":docs"]
package_dir = "my/fake/root",
)

To merge them and rewrite the path_in_archive for everything.

Not a bad idea. I'd need to think on the precise mechanics of it a little bit.

Perhaps we can do a VC brainstorm soon to come to agreement on the core providers, then we can fork experimentation with rules that create them styles alongside trying to implement in all the package types to see what we left out.

Sounds like a good idea -- I'll get in touch about this.

@nacl
Copy link
Collaborator Author

nacl commented Feb 10, 2022

I just was just doing edits to a file that easily has 1000+ lines of pkg_files. It doesn't look pretty, and something like this would help immensely.

Going to try to resurrect this in the next few weeks, especially since we have the new (well, not so new) providers.

@aiuto
Copy link
Collaborator

aiuto commented Feb 15, 2022

I forgot all about this one.
FWIW, the other bazel team members hate my suggestion about the PackageEntryInfo function for each row in the manifest. They have been trying to kill rules like that for a while. I'm still uncertain without a PoC to try it out.

@alexeagle
Copy link
Collaborator

Hey @nacl did you look at https://man.freebsd.org/cgi/man.cgi?mtree(8) ?
It has an existing specification format that's tabular and I believe gives full fidelity for what can be in a tar file. Plus, it's existing, very mature UNIX software so there's nothing for us to invent and it already works with lots of tools, including BSD tar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants