-
Notifications
You must be signed in to change notification settings - Fork 19
sw360(purl): replacing coordinates with purl in externalId #291
sw360(purl): replacing coordinates with purl in externalId #291
Conversation
@blaumeiser-at-bosch is that how you imagined it with your issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, what I planned, but see the only remark in the code
release.setCoordinates(getMapOfCoordinates(artifact)); | ||
} | ||
private static void setPackageUrl(SW360Release sw360Release, Artifact artifact) { | ||
Optional<PackageURL> packageURL = artifact.askForAll(ArtifactCoordinates.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I think this findAny is not the adequate solution. In ArtifactUtils there is a default preference for coordinates which returns the coordinates according to priority, this seems to be the right thing to do.
If I remember right, we discussed that we want to store all known purls for the object. In the same way as the hash codes, i.e., with an _ suffix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a call yesterday with @sschuberth. In our discussion we came to the conclusion that per artifact there should only be one purl. Did we get that wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the principle discussion whether we have one data entry for a component with different established distributions or several. E.g. a repackaged maven component, which is provided in a p2 repository has two different coordinates with the same content.
One can argue that these are two components and therefore two instances with different purls, but effectively the metadata for the component are the same, as the sources from which the component has been build are the same.
So, my preferred solution is to have all known instances of a component managed in one data object. That means multiple purls pointing to the same data object
A similar case is in nuget, where we have the original source in nuget.org, but the same component, differently packaged from Microsoft, or with alternative package managers in other languages. In SW360, we should be able to manage multiple instances, whereas I think the way we did it in Antenna with multiple coordinates might be wrong because we look at a given set of instances, not the overall data of a component.
Adding @sschuberth into the discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the metadata for the component are the same, as the sources from which the component has been build are the same.
IMO we can and should use PURL to make exactly that distinction: To differentiate between different packagings of the same component or source code (e.g. .tar.gz
vs. .zip
), and to differentiate between different locations from which the same package (e.g. Maven Central vs. JCenter).
Linking some (semi-)related issue discussions here:
"separate type from provider": package-url/purl-spec#33
"Source of repackaged code": clearlydefined/curated-data#25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the SW360 we do not distinguish between an artifact, its sources, or whether it was pulled from maven or from github. This means that in SW360 all the corresponding PURLs should be added to one release.
I think two artifacts with the same name and version are the same item in SW360.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxhbr what format would you suggest for the case: two purls from the same type?
(like started in this conversation: #291 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going over the discussion between you @sschuberth and Jeff, it seems that I have a different understanding of what a purl references. From what I see, a purl references a unique id/coordinates for a component release unifying the different ways of how component coordinates are represented between technologies.
That means, that for one technology (e.g. maven) I would expect exactly one purl that references a version of a component. Having said that, a technology means basically a packaging type or a package manager.
There are several consequences:
- A component release available in different package manager technologies has multiple purls, each of which uniquely identifies the component release.
- I do not care about the provider or source of the component, if it is the content, it has the corresponding purl of the package manager technology, since both the instances are the same thing, i.e., built from the same source code.
- If the content is different, then it is a different release, i.e., it has a different purl
- Whether I trust that the release has the content the purl indicates is not a matter of purl but a matter of do I trust the source of this release.
- The source of the release is something I would like to know but this is out of scope of the purl spec.
- The purl type indentifies the technology with which I use the component in the build, aka as the package manager technology. I still can download the component from the source and build it on my own, but even then the normal way would be to build the corresponding library and use it as if I would download it directly from the internet: e.g., Build a Jar, deploy it to a maven repo manager (e.g. Nexus) and reference it with the repo manager as provider in my pom.xml.
- If I do not use it as described in 6, e.g., if I download the source and directly build it together with my sources, this still is the same content, but from my point, I would need to use a different purl, e.g., with type github, because the way I use it is different. I do not use the component as Maven component anymore, but as bunch of sources, so different purl.
I think that covers my line of thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically for this PR and how to implement the solution, what is the implication of your line of thinking for this implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We have multiple purls per release
- The purls should be requested as proposed in the ArtifactUtils default sequence, not findAny or findFirst
- Since I assume only one purl per technology, I would follow Max comment from below that the key in the external id should be the type used in the purl as redundant element
- Purls are used in the Release object with PackageURL class not string and only when trnaslated into the JSON the string representation is used.
Waiting for #292 before continuing, since necessary for Enricher |
0abb9e5
to
62a055a
Compare
@JsonIgnore | ||
public Map<String, String> getCoordinates() { | ||
public Set<String> getPurls() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more appropriate to use the PackageURL class instead of String to handle the object and only use the String representation when transfering it to SW360.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the SW360Release
should be as near as possible to the representation which will be transferred and therefore I think that it is perfectly fine to have them as string. Especially since they are stored in the externalId
map.
private static void setPackageUrls(SW360Release sw360Release, Artifact artifact) { | ||
Set<String> packageURLS = artifact.askForAll(ArtifactCoordinates.class).stream() | ||
.map(ArtifactCoordinates::getPurl) | ||
.map(PackageURL::toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if toString is the right method, the "official" method for getting the String representation of the PURL is canonicalize().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While toString() indeed just calls canonicalize() I'd also prefer to be explicit and directly call canonicalize()
here.
int i = 1; | ||
for (String purl : purls) { | ||
if (purl != null && !purl.isEmpty()) { | ||
externalIds.put(PURL_PREFIX + i, purl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are the keys just the purls with enumeration, we have a better option. In my opinion it would be better to use the old prefix, something like:
'mvn': 'pkg:maven/org.apache.xmlgraphics/batik-anim@1.9.1'
This also allows for better searches via the SW360 rest interface.
Or maybe just use the scheme as key. (yes, there can be multiple purls of the same scheme for the same artifact. But I do not think that this is a problem here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blaumeiser-at-bosch what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting point, disadvantage here is that I have to know which kind of component that is so searching is easier on the one hand side, because I do not have to search until the index ends, but it is also harder, because I have to find out which types of purls are there.
In general, we should use the same type as key, not anything else, so in the example case, "maven" instead of "mvn"
With scheme you mean the type, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I probably mean type
.
.withNamespace("test") | ||
.withVersion("1.2.3") | ||
.build() | ||
.toString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as https://github.com/eclipse/antenna/pull/291/files#r326196146, use canonicalize()
.
@@ -70,8 +70,8 @@ private void addClearingStateIfAvailable(Artifact artifact, SW360Release release | |||
} | |||
|
|||
private void mapExternalIdsOnSW360Release(SW360Release sw360Release, Artifact artifact) { | |||
mapCoordinates(sw360Release) | |||
.forEach(artifact::addFact); | |||
mapCoordinatesFromPurls(sw360Release). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually we place the dot in the next line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small thing
private static final String PURL_PREFIX = "purl_"; | ||
private static final Set<String> PURL_TYPE = Stream.of( | ||
"bitbucket", "composer", "deb", "docker", | ||
"gem", "generic", "github", "golang", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have type p2 as well as custom type for p2 repository links
e04db78
to
eb7f7b7
Compare
return externalIds.entrySet().stream() | ||
.filter(e -> SW360CoordinateKeysToArtifactCoordinates.getValues().contains(e.getKey())) | ||
.filter(e ->PURL_TYPE.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a space behind the arrow
return externalIds.entrySet().stream() | ||
.filter(e -> SW360CoordinateKeysToArtifactCoordinates.getValues().contains(e.getKey())) | ||
.filter(e ->PURL_TYPE.stream() | ||
.anyMatch(purlType -> e.getKey().equals(purlType))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong indentation, that is part of the above filter, perhaps do not do it in a new line or do it as PURL_TYPE.contains(e.getKey())
private void dropAllPurls() { | ||
externalIds.keySet().stream() | ||
.filter(e -> PURL_TYPE.stream() | ||
.anyMatch(e::startsWith)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why startsWith? Isn't that redundant code to my comment above, should then be a method, I think
public SW360Release setPurls(Map<String, String> purls) { | ||
dropAllPurls(); | ||
|
||
purls.forEach(externalIds::put); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not putAll?
@@ -34,7 +37,7 @@ public static SW360Release prepareRelease(SW360Component component, Set<String> | |||
release.setComponentId(componentId); | |||
release.setMainLicenseIds(sw360LicenseIds); | |||
|
|||
SW360ReleaseAdapterUtils.setCoordinates(release, artifact); | |||
SW360ReleaseAdapterUtils.setPackageUrls(release,artifact); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space missing after ,
eb7f7b7
to
c18c7ad
Compare
private static final Set<String> PURL_TYPE = Stream.of( | ||
"bitbucket", "composer", "deb", "docker", | ||
"gem", "generic", "github", "golang", "maven", | ||
"npm", "nuget", "p2", "pypi", "rpm").collect(Collectors.toSet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"npm", "nuget", "p2", "pypi", "rpm").collect(Collectors.toSet()); | |
"npm", "nuget", "p2", "pypi", "rpm").collect(Collectors.toSet()); |
return this; | ||
} | ||
|
||
private void dropAllPurls() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this drops the docker
purl or other unsupported purls, without replacing it later? that should not happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think PURLs, which we do not handle, should be preserved in the release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for clarification: you suggest to overwrite the externalIds
entries with the entries we get when setPurls
is called? (which would happen automatically due to it being a map when using putAll
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure, maybe one wants do drop the p2
coordinates (and other supported types), if
- they are explicitly not provided or
- if they should be contained in the antenna artifact, if they were present previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe best solution is: just remove dropAllPurls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, point @maxhbr
Actually, the question is, what setPurls does, Is it resetting the object or simply adding the purls given. Currently it completely resets the object concerning purls. Could be the right thing to do. How often will the case happen, that there are already purls inside the object and we add more?
When I read this, I was also unsure, if the deletion should be done, but after rethinking it now, I would propose to not delete the purls and check the concrete use cases first, before we lose data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently setPurls()
is only called in Antenna when creating a new release from artifact data.
Hence, currently there would never be any purls already safed in the release. The mergeWith
function of the release merges all externalIds overall.
Hence, currently the dropAllPurls
function is redundant.
c18c7ad
to
778219e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some unclear points, since there is not a 1:1 mapping between PURLs and ArtifactCoordinates
@@ -56,9 +56,8 @@ public void artifactIsMappedToSw360ReleaseCorrectly() { | |||
assertThat(false).isTrue(); | |||
} | |||
|
|||
assertThat(release.getCoordinates()).containsKeys("mvn"); | |||
assertThat(release.getCoordinates()).hasSize(1); | |||
assertThat(release.getCoordinates()).containsValue("org.group.id:artifactId(test1):1.2.3"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't there a replacement for this unit test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the test in the updater test
...sw360/src/main/java/org/eclipse/sw360/antenna/sw360/rest/resource/releases/SW360Release.java
Show resolved
Hide resolved
...360/src/main/java/org/eclipse/sw360/antenna/sw360/workflow/processors/SW360EnricherImpl.java
Show resolved
Hide resolved
...sw360/src/main/java/org/eclipse/sw360/antenna/sw360/rest/resource/releases/SW360Release.java
Show resolved
Hide resolved
This open up a different problem, if we allow to have multiple purls with the same type, we need also a change in the storage model, because currently we can only hold one purl per type in SW360. So we cannot handle the case of multiple purls of the same type anyhow. This is a restriction for now. We would end up with _x again to handle this, but then we run into trouble within the Coordinates as well. This becomes too big for this change. If we can identify all purls, we should return it, the SW360 module does not have to mind the model gaps from the internal Antenna model, if we can easily find out any purl stored, we can filter at another place, e.g., when we create the coordinates objects. |
On hold: @maxhbr implements a change in ArtifactCoordinates reducing the different types to one type which influences the way this PR needs to be implemented |
d4c6585
to
afcba26
Compare
@blaumeiser-at-bosch && @maxhbr Hello, |
...sw360/src/main/java/org/eclipse/sw360/antenna/sw360/rest/resource/releases/SW360Release.java
Show resolved
Hide resolved
...360/src/main/java/org/eclipse/sw360/antenna/sw360/workflow/processors/SW360EnricherImpl.java
Show resolved
Hide resolved
@neubs-bsi Please rebase |
afcba26
to
c36fda7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Coordinates have been removed from the sw360 data model and purls have been added * removed coordinates to coordinates key mapping * added purls in sw360 release * represented by a map of string, string * keys are the antenna supported types of purl * test have been updated * documentation has been updated Signed-off-by: Stephanie Neubauer <Stephanie.Neubauer@bosch-si.com>
c36fda7
to
dc842ee
Compare
Added purl to sw360 external Ids and removed coordinates