Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the concept of a meta-package to PurlDB #186

Open
Tracked by #272
DennisClark opened this issue Sep 21, 2023 · 4 comments
Open
Tracked by #272

Introduce the concept of a meta-package to PurlDB #186

DennisClark opened this issue Sep 21, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed high priority High Priority

Comments

@DennisClark
Copy link
Member

Suppose that you want to "watch" a package to be sure that new versions of that package get populated in your PurlDB. In order to drive such a process (and others to be determined) you need the concept of a meta-package, a non-instantiated package without any version, that is more like a template than an actual package, so that you can indicate that you want to watch it, and configure the watch frequency and similar things. You could also keep the meta-package updated so that it always shows the very latest attribute values of that package, although that is probably a "stretch goal".

One simple way to identify a meta-package would be to use the reserved keyword meta in the Version of the Package URL. So you would not be able to derive a Download URL for such an entry, but you might be able to find its "home" URL, which would support any process that looks for new versions.

Just a few ideas. More details needed of course! Note that any client systems of the PurlDB would need to be aware that a meta-package is a special concept; perhaps this can be supported by enhancements to the APIs.

@DennisClark DennisClark added enhancement New feature or request help wanted Extra attention is needed labels Sep 21, 2023
@armijnhemel
Copy link

armijnhemel commented Sep 23, 2023

NOTE: this is tracked now in #308

This is actually something that recently dawned upon me as well and I have been thinking about this for quite some time. I already warned @pombredanne that I would be leaving a very long description of my thoughts, so here it is.

When you look at the purlspec ( https://github.com/package-url/purl-spec ) you can see that a purl has (at least) 7 components (or actually, at least 6, as the first one is always pkg). The second component indicates a hint about the format of the package, such as rpm, deb, and so on.

While I think that when talking about a specific instance of a package purl is the right way to describe it, it is not how people think about packages. Let's look at an example from the purlspec:

pkg:rpm/fedora/curl@7.50.3-1.fc25?arch=i386&distro=fedora-25

This describes the binary RPM package from a version of Fedora for a particular architecture. This package was built in a certain way, with a certain configuration, in a certain environment, and possibly with some patches applied to the source code tree before it was built. There could also be a similar package for a version of Debian. This would NOT describe the exact same package (as it was built in a different environment, with a different configuration and possibly with different patches) but it is a related package. What relates the two packages is that they derive from the same basis, namely the curl source code archive, which can also be described using a purl.

So all these purls (the Fedora package, Debian package and original source code archive) are related to each other, but they are not identical. But this is not how people conceptually think about "a package". They will refer to the Fedora RPM as "curl", to the Debian deb as "curl" and to the original source code archive as "curl". This is not necessarily wrong, but also not necessarily right (as explained above).

If instead there would be a meta package for "curl" then all of the purls (Fedora RPM, Debian deb, source code archive) can be seen as instances of the meta package "curl". These instances could have associated facts (for the lack of a better word) describing certain aspects of the fact which might or might not be correct ("facts" that could be extracted from the RPM metadata: location of the VCS, location of the webpage, package name, and so on).

The above example is a bit simple and straightforward, so let's throw in a few more complex examples, starting with renaming packages. There are distributions that rename packages. The most straightforward example is Debian that uses lower case names for all of its packages by convention (along with some other things, like replacing hyphens with other characters). A renamed package would still be an instance of a "meta package".

A slightly more radical example: in Debian the httpd package was renamed to apache2, while Fedora uses httpd. Both are packages derived from the Apache httpd source code and thus are related and should not be seen as completely different packages. Instead, there could be an "Apache httpd" meta package that has both the Fedora and Debian packages as instances.

Another more difficult example would be GCC: from the GCC code base many different packages are created, which the GCC 13 page on Launchpad shows: https://launchpad.net/ubuntu/+source/gcc-13
These are very obviously not the same packages, but they were generated from the same source code, or subsets of the same source code, so they are related. Add to that all the different versions of GCC, and the different configurations they were built in (cross compilers, etc.) and you can see that it can get quite complex. Yet: still they are all related.

Wrapping up: I think that the idea of a "meta package" is great, as this is how people are used to talk about code. A meta package could have several instances which are described by purls that point to specific binary packages/source code archives, which in turn have facts (metadata) associated with them. The meta package could try to consolidate these facts (along with other facts from for example Wikidata) and/or present these to the user in a certain way.

@DennisClark
Copy link
Member Author

@armijnhemel Thanks for all the very useful and detailed comments, which are much appreciated!

@pombredanne
Copy link
Member

I opened #308 to track the notions of similar packages derived from the same upstream.

The notion of meta-package here is more about tracking a package without a specific version with its defaults and the watches.

@pombredanne
Copy link
Member

This may be also part of a follow up to #373

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed high priority High Priority
Projects
None yet
Development

No branches or pull requests

4 participants