-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
separate type from provider #33
Comments
The
By provider I assume you mean some extra "protocol" used to fetch an actual Here I guess that both npm, but also Pypi and Rubygems have specific conventions Now there is the other use case that you detail where a given package may have multiple incarnations. E.g. a repo on GitHub that contains the source code for npm is also itself some npm. (And this is true also most if not all other package types). The difficulty in this case is that there could be multiple ways to express reference a package:
So this could be resolved in a few ways:
For your consideration... but I feel like it might be simpler to use multiple package URLs in this case rather than trying to combine multiple "personalities" in a type+provider. In particular the same GitHub or VCS URL can have multiple personalities: a single repo may contain a top level package.json, a bower.json and a pom.xml and more. Or a nuget.spec and a package.json |
Thanks for the detail @pombredanne. There is a differnet There is a bit of a miscommunication here.
In this proposal the The In practice the name in the manifest at the end of the purl may well be different than that indicated by the purl. This is to be expected and is, for example, the way that npm works. That's ok as long as the other information represents an immutable value. |
To take up the discussion again: @jeffmcaffer, I agree purl's current To me, So in a way, |
Thanks @sschuberth . I mostly agree with you and quite like the idea of unifying on purls. There are still some lingering issues.
|
Thanks for the detailed explanation, you have some good points there. Seeing that Sontatype as already adopted purl I was about to do so for ORT, too, but now I feel these lingering issues need to be resolved first. @jeffmcaffer, do you still think purl is "fixable" to capture what you need e.g. by using the Also, is there a full spec of the identifier ClearlyDefined uses? I've found https://github.com/clearlydefined/clearlydefined/blob/master/docs/providers.md, but that's only about the provider part. |
@jeffmcaffer FYI OWASP Dependency-Track and CycloneDX also have both adopted PackageURL. But I'm confused/concerned about the
This is only one of many use-cases for PackageURL. I do not think having a provider is possible (or even desirable) in the specification. This should be the job of the application that is implementing PackageURL and is what Dependency-Track does for example. Besides downloading the content (which itself can have different auth/proxy/network config issues so it's not as simple as just download), there's use-cases for identifying old/outdated versions of components using the repositories native APIs. For example, if I have a PackageURL for Apache Commons IO, I may not want to download it, but to query the repository for the current version of the thing to see if what I have is current or not. In this case, the provider example would be useless, especially since various repo implementations have various levels of API support. For example, I can download something from npmjs.org and query for the current version of the thing, and I can also do that with the other npmjs repos, but if I try to do it with a Sonatype Nexus 3 repo, it won't work even though it supports npmjs (it simply doesn't support the necessary api). I'm struggling to find benefits of having a provider as part of the spec without unnecessarily increasing complexity. |
The misunderstanding may be explained by something @pombredanne said a few comments ago
What I am proposing is that this is actually Put another way, the point of the provider is precisely to capture the vagaries of accessing the particular host like auth/proxy/.. and API level support. With the provider approach you would talk about There is also a fundamental question about identity: does the identity of a package include the place from which it comes? If so, then "provider" (call it what you want) is an integral part of the structure. The spec can allow for it to be undefined or default to the common provider but fundamentally it would still be part of the identity. If OTOH you don't want that characteristic, and assume all npms called "foo 1.0" are the same regardless of where they come from, that's ok but is a different identity model. While I get that In the ClearlyDefined scenarios, we need to be able to get to and identify things that have different forms and are hosted in different places (npms as github repos, or github releases, or wrapped in a NuGet, or... ) I do not claim to have resolved all the corner cases. In fact, we very much would like to use purl as it is more robust in other dimensions. I am however having trouble figuring out how to pragmatically code/design with purl where we need/want the separation I'm describing. @sschuberth, you should talk to @tsteenbe about this. He and I talked some and IIRC he perceived the same sorts of issues with the overloading of |
@jeffmcaffer I have a concrete question about how you see |
Also see the related purl discussion at [1]. What we were meaning by "provider" actually more resembles purl's "type". [1] package-url/purl-spec#33 Signed-off-by: Sebastian Schuberth <sebastian.schuberth@here.com>
Also see the related purl discussion at [1]. What we were meaning by "provider" actually more resembles purl's "type". [1] package-url/purl-spec#33 Signed-off-by: Sebastian Schuberth <sebastian.schuberth@here.com>
Also see the related purl discussion at [1]. What we were meaning by "provider" actually more resembles purl's "type". [1] package-url/purl-spec#33 Signed-off-by: Sebastian Schuberth <sebastian.schuberth@here.com>
Also see the related purl discussion at [1]. What we were meaning by "provider" actually more resembles purl's "type". [1] package-url/purl-spec#33 Signed-off-by: Sebastian Schuberth <sebastian.schuberth@here.com>
|
The fundamental issue is that there is a difference between the format (aka For example, you can get an NPM from many different places using different protocols (npm install, tgz fetch, git clone, ...). You can get a git repo by purl does a good job of capturing When the "package" does not conform to a norm, you can spec a To illustrate, getting an npm from from GitHub could be done using the npm protocol to the GitHub package registry, by cloning the git repo, or by downloading a release. All locations are github.com/ but knowing the protocol allows us to talk different APIs to the location (e.g., we can ask GitHub for other releases, or for other branches/tags). We could use the As mentioned above, I don't claim to have the answers but would love to collaborate to figure it out. |
Going over the discussion so far, it seems that I have a different understanding of what a purl references. For my use cases, a purl references a unique id/coordinates for a component release unifying the different ways of how components are uniquely represented in a certain technology. That means, that for one technology (e.g. maven) I would expect exactly one purl that references a version of a component. Having said that, a technology means basically a packaging type or a package manager. There are several consequences:
So looking at your separation, @jeffmcaffer, for me: |
@blaumeiser-at-bosch, what about different repositories? |
Or even worse, published with the same coordinates to Maven Central and to JCenter/Bintray (or any other Maven-compatible repository). I guess the underlying question is whether PURL should "only" identify the package in the sense of the contents of the package, i.e. I do care that it's the same package / file that I'm referring to, but I do not care where I got it from. Or should PURL also document where I got this copy of the same package from. As PURL's goals are described as "reliably identify and locate software packages" (emphasis mine), I believe it should also document the where from. That would make PURL also more usable in the ClearlyDefined context where provenance matters, and I believe that's where @jeffmcaffer is coming from. The next question would be how to integrate the where from, i.e. the provider, into the PURL standard. As I guess it's too late to define a dedicated field in the "base URL" for that, an option that was already discussed is to use URL qualifiers. And that option is actually not too bad: Users who do not care about the provider, but only about it being the same package (with the same hash) could just compare the base URL, whereas users who care about the provider need to additionally take the provider into account. But if doing that we'd need a standardized (i.e. non-type-specific) name for a qualifier describing the provider, plus a documented default provider per type if no provider is specified. |
It does. Theres a default repo for most PURL types. For Maven, the default is Maven Central. If an artifact with the same coordinates exists in bintray, the |
I know. My point was that the name of the qualifier which specifies a non-default repo is not standardized, as |
My point is, that I want to reference a content that has certain properties, a license, a copyright, ... I do not get your statement concerning assigning, because my understanding is, that the purl is defined by the properties of the component, namely technology/package manager type, namespace, name and version, these properties build the main part of the purl. Interestingly, there are also technical aspects: The question is, what am I talking of, when I have the purl. The concrete instance of the open source component found somewhere in a repository, or the original open source component which was instantiated for deployment. IMO, I prefer to identify the original component and have additional metadata associated with this component, like known deployed instances and locations to get it from. Perhaps I am missing something, but for me this is not needed to identify the thing I want to identify. |
@blaumeiser-at-bosch, for Java, the package is jar, and it has no inherent namespace, name or version. One could say that the jar is not important too, since Java only cares about packages and classes inside. Their content and licenses are what matters, not the (re)packed jar. |
I think this is not correct in general case. The same content can be built under different licenses. Maven has |
@grv87 You are right, that the same piece of OSS code could have multiple PURLs each of which should identify this piece of software clearly, so yes, there are multiple ways of referencing the component. But still I struggle with the notion that the same thing from different locations are different things. Even the wording is strange. 😃 The situation is different, if it is not the same thing, e.g., because it is the same piece of software licenses differently. In this case, I would absolutetly appreciate the two things to have different purls and ideally not only the location part of the purl but some substantial difference. If we cannot rely that one purl is referencing the component unambiguously, the whole thing with identifying dependencies and attach known metadata to the detected dependencies becomes very difficult. |
It will likely be hard for purl to reconcile package identity semantics across all the ecosystems. It feels even harder if we start mixing additional package metadata like licenses etc. Perhaps purls should be focused on locating the package rather than describing it. Note that a difference in the purl case (vs the url case) is that purls help you locate the content and the metadata (e.g., registry info). If we just needed the content then a plain url to the zip, jar, tgz, ... would be fine. With a purl the user knows what protocol to talk to what registry and how to address the package of interest. Anything beyond that (e.g., copyrights, ...) can be left to the content of the package or registry supplied metadata about the package. |
…type The PURL specification has the issue that types and providers are not separated [1]. ORT uses the package manager type as opposed to using the PURL type that e.g. Nexus IQ requires. If a package manager type cannot be mapped to a PURL type ORT should fallback to the package manager type instead of breaking the calling code by returning `null`. [1] package-url/purl-spec#33 Signed-off-by: Marcel Bochtler <marcel.bochtler@bosch.io>
The PURL specification has the issue that types and providers are not separated [1]. ORT uses the package manager type as opposed to using the PURL type that e.g. Nexus IQ requires. If a package manager type cannot be mapped to a PURL type ORT should fallback to the package manager type instead of breaking the calling code by returning `null`. [1] package-url/purl-spec#33 Signed-off-by: Marcel Bochtler <marcel.bochtler@bosch.io>
The PURL specification has the issue that types and providers are not separated [1]. ORT uses the package manager type as opposed to using the PURL type that e.g. Nexus IQ requires. If a package manager type cannot be mapped to a PURL type ORT should fallback to the package manager type instead of breaking the calling code by returning `null`. [1] package-url/purl-spec#33 Signed-off-by: Marcel Bochtler <marcel.bochtler@bosch.io>
The PURL specification has the issue that types and providers are not separated [1]. ORT uses the package manager type as opposed to using the PURL type that e.g. Nexus IQ requires. If a package manager type cannot be mapped to a PURL type ORT should fallback to the package manager type instead of breaking the calling code by returning `null`. [1] package-url/purl-spec#33 Signed-off-by: Marcel Bochtler <marcel.bochtler@bosch.io>
The PURL specification has the issue that types and providers are not separated [1]. ORT uses the package manager type as opposed to using the PURL type that e.g. Nexus IQ requires. If a package manager type cannot be mapped to a PURL type ORT should fallback to the package manager type instead of breaking the calling code by returning `null`. [1] package-url/purl-spec#33 Signed-off-by: Marcel Bochtler <marcel.bochtler@bosch.io>
The PURL specification has the issue that types and providers are not separated [1]. ORT uses the package manager type as opposed to using the PURL type that e.g. Nexus IQ requires. If a package manager type cannot be mapped to a PURL type ORT should fallback to the package manager type instead of breaking the calling code by returning `null`. [1] package-url/purl-spec#33 Signed-off-by: Marcel Bochtler <marcel.bochtler@bosch.io>
Strictly speaking, a package id is not enough to query curations as different servers might host different artifacts under the same id. A typical example is an internal fork of an upstream artifact that is internally hosted under the same id. Currently, ClearlyDefined is the only curation provider that allows to take this into account via its "provider" concept, see [1] and [2] for related discussions. So, as a preparation for ORT to replace the hard-coded providers [3] with real ones determined based on URLs, chnage the curation provider API to take whole packages instead of only their ids, so that implementations have access to artifacts and VCS URLs. [1]: #155 [2]: package-url/purl-spec#33 [3]: https://github.com/oss-review-toolkit/ort/blob/33531c7/model/src/main/kotlin/utils/Extensions.kt#L38-L57 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Strictly speaking, a package id is not enough to query curations as different servers might host different artifacts under the same id. A typical example is an internal fork of an upstream artifact that is internally hosted under the same id. Currently, ClearlyDefined is the only curation provider that allows to take this into account via its "provider" concept, see [1] and [2] for related discussions. So, as a preparation for ORT to replace the hard-coded providers [3] with real ones determined based on URLs, chnage the curation provider API to take whole packages instead of only their ids, so that implementations have access to artifacts and VCS URLs. [1]: #155 [2]: package-url/purl-spec#33 [3]: https://github.com/oss-review-toolkit/ort/blob/33531c7/model/src/main/kotlin/utils/Extensions.kt#L38-L57 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Strictly speaking, a package id is not enough to query curations as different servers might host different artifacts under the same id. A typical example is an internal fork of an upstream artifact that is internally hosted under the same id. Currently, ClearlyDefined is the only curation provider that allows to take this into account via its "provider" concept, see [1] and [2] for related discussions. So, as a preparation for ORT to replace the hard-coded providers [3] with real ones determined based on URLs, chnage the curation provider API to take whole packages instead of only their ids, so that implementations have access to artifacts and VCS URLs. [1]: #155 [2]: package-url/purl-spec#33 [3]: https://github.com/oss-review-toolkit/ort/blob/33531c7/model/src/main/kotlin/utils/Extensions.kt#L38-L57 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Strictly speaking, a package id is not enough to query curations as different servers might host different artifacts under the same id. A typical example is an internal fork of an upstream artifact that is internally hosted under the same id. Currently, ClearlyDefined is the only curation provider that allows to take this into account via its "provider" concept, see [1] and [2] for related discussions. So, as a preparation for ORT to replace the hard-coded providers [3] with real ones determined based on URLs, change the curation provider API to take whole packages instead of only their ids, so that implementations have access to artifacts and VCS URLs. [1]: #155 [2]: package-url/purl-spec#33 [3]: https://github.com/oss-review-toolkit/ort/blob/33531c7/model/src/main/kotlin/utils/Extensions.kt#L38-L57 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Strictly speaking, a package id is not enough to query curations as different servers might host different artifacts under the same id. A typical example is an internal fork of an upstream artifact that is internally hosted under the same id. Currently, ClearlyDefined is the only curation provider that allows to take this into account via its "provider" concept, see [1] and [2] for related discussions. So, as a preparation for ORT to replace the hard-coded providers [3] with real ones determined based on URLs, change the curation provider API to take whole packages instead of only their ids, so that implementations have access to artifacts and VCS URLs. [1]: #155 [2]: package-url/purl-spec#33 [3]: https://github.com/oss-review-toolkit/ort/blob/33531c7/model/src/main/kotlin/utils/Extensions.kt#L38-L57 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Strictly speaking, a package id is not enough to query curations as different servers might host different artifacts under the same id. A typical example is an internal fork of an upstream artifact that is internally hosted under the same id. Currently, ClearlyDefined is the only curation provider that allows to take this into account via its "provider" concept, see [1] and [2] for related discussions. So, as a preparation for ORT to replace the hard-coded providers [3] with real ones determined based on URLs, change the curation provider API to take whole packages instead of only their ids, so that implementations have access to artifacts and VCS URLs. [1]: #155 [2]: package-url/purl-spec#33 [3]: https://github.com/oss-review-toolkit/ort/blob/33531c7/model/src/main/kotlin/utils/Extensions.kt#L38-L57 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Strictly speaking, a package id is not enough to query curations as different servers might host different artifacts under the same id. A typical example is an internal fork of an upstream artifact that is internally hosted under the same id. Currently, ClearlyDefined is the only curation provider that allows to take this into account via its "provider" concept, see [1] and [2] for related discussions. So, as a preparation for ORT to replace the hard-coded providers [3] with real ones determined based on URLs, change the curation provider API to take whole packages instead of only their ids, so that implementations have access to artifacts and VCS URLs. [1]: #155 [2]: package-url/purl-spec#33 [3]: https://github.com/oss-review-toolkit/ort/blob/33531c7/model/src/main/kotlin/utils/Extensions.kt#L38-L57 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Strictly speaking, a package id is not enough to query curations as different servers might host different artifacts under the same id. A typical example is an internal fork of an upstream artifact that is internally hosted under the same id. Currently, ClearlyDefined is the only curation provider that allows to take this into account via its "provider" concept, see [1] and [2] for related discussions. So, as a preparation for ORT to replace the hard-coded providers [3] with real ones determined based on URLs, change the curation provider API to take whole packages instead of only their ids, so that implementations have access to artifacts and VCS URLs. [1]: #155 [2]: package-url/purl-spec#33 [3]: https://github.com/oss-review-toolkit/ort/blob/33531c7/model/src/main/kotlin/utils/Extensions.kt#L38-L57 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Strictly speaking, a package id is not enough to query curations as different servers might host different artifacts under the same id. A typical example is an internal fork of an upstream artifact that is internally hosted under the same id. Currently, ClearlyDefined is the only curation provider that allows to take this into account via its "provider" concept, see [1] and [2] for related discussions. So, as a preparation for ORT to replace the hard-coded providers [3] with real ones determined based on URLs, change the curation provider API to take whole packages instead of only their ids, so that implementations have access to artifacts and VCS URLs. [1]: #155 [2]: package-url/purl-spec#33 [3]: https://github.com/oss-review-toolkit/ort/blob/33531c7/model/src/main/kotlin/utils/Extensions.kt#L38-L57 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Strictly speaking, a package id is not enough to query curations as different servers might host different artifacts under the same id. A typical example is an internal fork of an upstream artifact that is internally hosted under the same id. Currently, ClearlyDefined is the only curation provider that allows to take this into account via its "provider" concept, see [1] and [2] for related discussions. So, as a preparation for ORT to replace the hard-coded providers [3] with real ones determined based on URLs, change the curation provider API to take whole packages instead of only their ids, so that implementations have access to artifacts and VCS URLs. [1]: #155 [2]: package-url/purl-spec#33 [3]: https://github.com/oss-review-toolkit/ort/blob/33531c7/model/src/main/kotlin/utils/Extensions.kt#L38-L57 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Strictly speaking, a package id is not enough to query curations as different servers might host different artifacts under the same id. A typical example is an internal fork of an upstream artifact that is internally hosted under the same id. Currently, ClearlyDefined is the only curation provider that allows to take this into account via its "provider" concept, see [1] and [2] for related discussions. So, as a preparation for ORT to replace the hard-coded providers [3] with real ones determined based on URLs, change the curation provider API to take whole packages instead of only their ids, so that implementations have access to artifacts and VCS URLs. [1]: #155 [2]: package-url/purl-spec#33 [3]: https://github.com/oss-review-toolkit/ort/blob/33531c7/model/src/main/kotlin/utils/Extensions.kt#L38-L57 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Strictly speaking, a package id is not enough to query curations as different servers might host different artifacts under the same id. A typical example is an internal fork of an upstream artifact that is internally hosted under the same id. Currently, ClearlyDefined is the only curation provider that allows to take this into account via its "provider" concept, see [1] and [2] for related discussions. So, as a preparation for ORT to replace the hard-coded providers [3] with real ones determined based on URLs, change the curation provider API to take whole packages instead of only their ids, so that implementations have access to artifacts and VCS URLs. [1]: #155 [2]: package-url/purl-spec#33 [3]: https://github.com/oss-review-toolkit/ort/blob/33531c7/model/src/main/kotlin/utils/Extensions.kt#L38-L57 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
In the current spec the type of a package and the provider of a package are compressed into the
type
element. For example, type =npm
implies npmjs.com as the provider. While this is true in general, it gets complicated when talking about a package type that can live on different providers (e.g., an npm on GitHub).One possible path is to use the git-style
+
approach to get something likeor more generallly
This example indicates that there is an npm formatted entity on github in the
foo
repo in themyorg
org with commit hash a68381e.In this way, the current
type
element remains the type or format of the entity being located by the purl but theprovider
(if supplied) dictates the rest of the purl structure in the same way that thetype
does currently. If the provider is omitted then a spec'd default provider for the given type is used (e.g., npmjs for npm)The purl spec should enumerate separately the set of types and providers with canonical values. For providers it is likely best if the values are as symbolic as possible. That is, use
npmjs
rather thannpmjs.com
. This simplifies the URLs for the user (npmjs.com? npmjs.org? www.npmjs.*?) and insulates URLs from changes in the provider's deployment.The text was updated successfully, but these errors were encountered: