Skip to content
This repository has been archived by the owner on Mar 17, 2021. It is now read-only.

sw360(purl): replacing coordinates with purl in externalId #291

Merged

Conversation

neubs-bsi
Copy link
Contributor

Added purl to sw360 external Ids and removed coordinates

@neubs-bsi
Copy link
Contributor Author

@blaumeiser-at-bosch is that how you imagined it with your issue?

Copy link
Contributor

@blaumeiser-at-bosch blaumeiser-at-bosch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, what I planned, but see the only remark in the code

release.setCoordinates(getMapOfCoordinates(artifact));
}
private static void setPackageUrl(SW360Release sw360Release, Artifact artifact) {
Optional<PackageURL> packageURL = artifact.askForAll(ArtifactCoordinates.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I think this findAny is not the adequate solution. In ArtifactUtils there is a default preference for coordinates which returns the coordinates according to priority, this seems to be the right thing to do.

If I remember right, we discussed that we want to store all known purls for the object. In the same way as the hash codes, i.e., with an _ suffix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a call yesterday with @sschuberth. In our discussion we came to the conclusion that per artifact there should only be one purl. Did we get that wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the principle discussion whether we have one data entry for a component with different established distributions or several. E.g. a repackaged maven component, which is provided in a p2 repository has two different coordinates with the same content.

One can argue that these are two components and therefore two instances with different purls, but effectively the metadata for the component are the same, as the sources from which the component has been build are the same.

So, my preferred solution is to have all known instances of a component managed in one data object. That means multiple purls pointing to the same data object

A similar case is in nuget, where we have the original source in nuget.org, but the same component, differently packaged from Microsoft, or with alternative package managers in other languages. In SW360, we should be able to manage multiple instances, whereas I think the way we did it in Antenna with multiple coordinates might be wrong because we look at a given set of instances, not the overall data of a component.

Adding @sschuberth into the discussion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the metadata for the component are the same, as the sources from which the component has been build are the same.

IMO we can and should use PURL to make exactly that distinction: To differentiate between different packagings of the same component or source code (e.g. .tar.gz vs. .zip), and to differentiate between different locations from which the same package (e.g. Maven Central vs. JCenter).

Linking some (semi-)related issue discussions here:

"separate type from provider": package-url/purl-spec#33
"Source of repackaged code": clearlydefined/curated-data#25

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the SW360 we do not distinguish between an artifact, its sources, or whether it was pulled from maven or from github. This means that in SW360 all the corresponding PURLs should be added to one release.

I think two artifacts with the same name and version are the same item in SW360.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxhbr what format would you suggest for the case: two purls from the same type?
(like started in this conversation: #291 (comment))

Copy link
Contributor

@blaumeiser-at-bosch blaumeiser-at-bosch Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going over the discussion between you @sschuberth and Jeff, it seems that I have a different understanding of what a purl references. From what I see, a purl references a unique id/coordinates for a component release unifying the different ways of how component coordinates are represented between technologies.

That means, that for one technology (e.g. maven) I would expect exactly one purl that references a version of a component. Having said that, a technology means basically a packaging type or a package manager.

There are several consequences:

  1. A component release available in different package manager technologies has multiple purls, each of which uniquely identifies the component release.
  2. I do not care about the provider or source of the component, if it is the content, it has the corresponding purl of the package manager technology, since both the instances are the same thing, i.e., built from the same source code.
  3. If the content is different, then it is a different release, i.e., it has a different purl
  4. Whether I trust that the release has the content the purl indicates is not a matter of purl but a matter of do I trust the source of this release.
  5. The source of the release is something I would like to know but this is out of scope of the purl spec.
  6. The purl type indentifies the technology with which I use the component in the build, aka as the package manager technology. I still can download the component from the source and build it on my own, but even then the normal way would be to build the corresponding library and use it as if I would download it directly from the internet: e.g., Build a Jar, deploy it to a maven repo manager (e.g. Nexus) and reference it with the repo manager as provider in my pom.xml.
  7. If I do not use it as described in 6, e.g., if I download the source and directly build it together with my sources, this still is the same content, but from my point, I would need to use a different purl, e.g., with type github, because the way I use it is different. I do not use the component as Maven component anymore, but as bunch of sources, so different purl.

I think that covers my line of thinking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically for this PR and how to implement the solution, what is the implication of your line of thinking for this implementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We have multiple purls per release
  • The purls should be requested as proposed in the ArtifactUtils default sequence, not findAny or findFirst
  • Since I assume only one purl per technology, I would follow Max comment from below that the key in the external id should be the type used in the purl as redundant element
  • Purls are used in the Release object with PackageURL class not string and only when trnaslated into the JSON the string representation is used.

@neubs-bsi
Copy link
Contributor Author

neubs-bsi commented Sep 19, 2019

Waiting for #292 before continuing, since necessary for Enricher

@neubs-bsi neubs-bsi force-pushed the neubs-bsi/sw360/purl/#828 branch from 0abb9e5 to 62a055a Compare September 19, 2019 13:30
@neubs-bsi neubs-bsi changed the title Neubs bsi/sw360/purl/#828 sw360(purl): replacing coordinates with purl in externalId Sep 19, 2019
@neubs-bsi neubs-bsi marked this pull request as ready for review September 19, 2019 13:33
@JsonIgnore
public Map<String, String> getCoordinates() {
public Set<String> getPurls() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more appropriate to use the PackageURL class instead of String to handle the object and only use the String representation when transfering it to SW360.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the SW360Release should be as near as possible to the representation which will be transferred and therefore I think that it is perfectly fine to have them as string. Especially since they are stored in the externalId map.

private static void setPackageUrls(SW360Release sw360Release, Artifact artifact) {
Set<String> packageURLS = artifact.askForAll(ArtifactCoordinates.class).stream()
.map(ArtifactCoordinates::getPurl)
.map(PackageURL::toString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if toString is the right method, the "official" method for getting the String representation of the PURL is canonicalize().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While toString() indeed just calls canonicalize() I'd also prefer to be explicit and directly call canonicalize() here.

int i = 1;
for (String purl : purls) {
if (purl != null && !purl.isEmpty()) {
externalIds.put(PURL_PREFIX + i, purl);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are the keys just the purls with enumeration, we have a better option. In my opinion it would be better to use the old prefix, something like:

   'mvn': 'pkg:maven/org.apache.xmlgraphics/batik-anim@1.9.1'

This also allows for better searches via the SW360 rest interface.

Or maybe just use the scheme as key. (yes, there can be multiple purls of the same scheme for the same artifact. But I do not think that this is a problem here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blaumeiser-at-bosch what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting point, disadvantage here is that I have to know which kind of component that is so searching is easier on the one hand side, because I do not have to search until the index ends, but it is also harder, because I have to find out which types of purls are there.

In general, we should use the same type as key, not anything else, so in the example case, "maven" instead of "mvn"

With scheme you mean the type, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I probably mean type.

.withNamespace("test")
.withVersion("1.2.3")
.build()
.toString()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -70,8 +70,8 @@ private void addClearingStateIfAvailable(Artifact artifact, SW360Release release
}

private void mapExternalIdsOnSW360Release(SW360Release sw360Release, Artifact artifact) {
mapCoordinates(sw360Release)
.forEach(artifact::addFact);
mapCoordinatesFromPurls(sw360Release).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually we place the dot in the next line

Copy link
Contributor

@blaumeiser-at-bosch blaumeiser-at-bosch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small thing

private static final String PURL_PREFIX = "purl_";
private static final Set<String> PURL_TYPE = Stream.of(
"bitbucket", "composer", "deb", "docker",
"gem", "generic", "github", "golang",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have type p2 as well as custom type for p2 repository links

return externalIds.entrySet().stream()
.filter(e -> SW360CoordinateKeysToArtifactCoordinates.getValues().contains(e.getKey()))
.filter(e ->PURL_TYPE.stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a space behind the arrow

return externalIds.entrySet().stream()
.filter(e -> SW360CoordinateKeysToArtifactCoordinates.getValues().contains(e.getKey()))
.filter(e ->PURL_TYPE.stream()
.anyMatch(purlType -> e.getKey().equals(purlType)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation, that is part of the above filter, perhaps do not do it in a new line or do it as PURL_TYPE.contains(e.getKey())

private void dropAllPurls() {
externalIds.keySet().stream()
.filter(e -> PURL_TYPE.stream()
.anyMatch(e::startsWith))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why startsWith? Isn't that redundant code to my comment above, should then be a method, I think

public SW360Release setPurls(Map<String, String> purls) {
dropAllPurls();

purls.forEach(externalIds::put);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not putAll?

@@ -34,7 +37,7 @@ public static SW360Release prepareRelease(SW360Component component, Set<String>
release.setComponentId(componentId);
release.setMainLicenseIds(sw360LicenseIds);

SW360ReleaseAdapterUtils.setCoordinates(release, artifact);
SW360ReleaseAdapterUtils.setPackageUrls(release,artifact);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space missing after ,

@neubs-bsi neubs-bsi force-pushed the neubs-bsi/sw360/purl/#828 branch from eb7f7b7 to c18c7ad Compare September 24, 2019 08:50
private static final Set<String> PURL_TYPE = Stream.of(
"bitbucket", "composer", "deb", "docker",
"gem", "generic", "github", "golang", "maven",
"npm", "nuget", "p2", "pypi", "rpm").collect(Collectors.toSet());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"npm", "nuget", "p2", "pypi", "rpm").collect(Collectors.toSet());
"npm", "nuget", "p2", "pypi", "rpm").collect(Collectors.toSet());

return this;
}

private void dropAllPurls() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this drops the docker purl or other unsupported purls, without replacing it later? that should not happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PURLs, which we do not handle, should be preserved in the release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarification: you suggest to overwrite the externalIds entries with the entries we get when setPurls is called? (which would happen automatically due to it being a map when using putAll)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure, maybe one wants do drop the p2 coordinates (and other supported types), if

  • they are explicitly not provided or
  • if they should be contained in the antenna artifact, if they were present previously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe best solution is: just remove dropAllPurls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, point @maxhbr
Actually, the question is, what setPurls does, Is it resetting the object or simply adding the purls given. Currently it completely resets the object concerning purls. Could be the right thing to do. How often will the case happen, that there are already purls inside the object and we add more?

When I read this, I was also unsure, if the deletion should be done, but after rethinking it now, I would propose to not delete the purls and check the concrete use cases first, before we lose data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently setPurls() is only called in Antenna when creating a new release from artifact data.
Hence, currently there would never be any purls already safed in the release. The mergeWith function of the release merges all externalIds overall.
Hence, currently the dropAllPurls function is redundant.

@neubs-bsi neubs-bsi force-pushed the neubs-bsi/sw360/purl/#828 branch from c18c7ad to 778219e Compare September 24, 2019 09:36
maxhbr
maxhbr previously requested changes Sep 24, 2019
Copy link
Contributor

@maxhbr maxhbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some unclear points, since there is not a 1:1 mapping between PURLs and ArtifactCoordinates

@@ -56,9 +56,8 @@ public void artifactIsMappedToSw360ReleaseCorrectly() {
assertThat(false).isTrue();
}

assertThat(release.getCoordinates()).containsKeys("mvn");
assertThat(release.getCoordinates()).hasSize(1);
assertThat(release.getCoordinates()).containsValue("org.group.id:artifactId(test1):1.2.3");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't there a replacement for this unit test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the test in the updater test

@blaumeiser-at-bosch
Copy link
Contributor

That is the problem with the key as the type of the corresponding coordinates. Perhaps the better solution is to check for the value of the Entry to be a purl, i.e., being a url with protocol "pkg"? This way we get every purl even those we do not know as of now?

If we just get every purl, we should first start supporting all features of purls and all types. Until then it is a good idea to restrict on keys which we control. This is a kind of namespacing and we should not try to understand everything every other tool does.

And there is also the problem that, if there are multiple maven purls, we end up in a nondeterministic state, where we do not know which purl is taken.

This open up a different problem, if we allow to have multiple purls with the same type, we need also a change in the storage model, because currently we can only hold one purl per type in SW360. So we cannot handle the case of multiple purls of the same type anyhow. This is a restriction for now. We would end up with _x again to handle this, but then we run into trouble within the Coordinates as well. This becomes too big for this change.

If we can identify all purls, we should return it, the SW360 module does not have to mind the model gaps from the internal Antenna model, if we can easily find out any purl stored, we can filter at another place, e.g., when we create the coordinates objects.

@blaumeiser-at-bosch
Copy link
Contributor

On hold: @maxhbr implements a change in ArtifactCoordinates reducing the different types to one type which influences the way this PR needs to be implemented

@neubs-bsi neubs-bsi added wontfix This will not be worked on and removed depends on something wontfix This will not be worked on labels Oct 2, 2019
@neubs-bsi neubs-bsi force-pushed the neubs-bsi/sw360/purl/#828 branch 2 times, most recently from d4c6585 to afcba26 Compare October 2, 2019 12:13
@neubs-bsi
Copy link
Contributor Author

@blaumeiser-at-bosch && @maxhbr

Hello,
after our planning today, I started cleaning up here.
The commit integrates most cleared points now.
Comments that are not addressed yet are the following:

@blaumeiser-at-bosch
Copy link
Contributor

@neubs-bsi Please rebase

@neubs-bsi neubs-bsi force-pushed the neubs-bsi/sw360/purl/#828 branch from afcba26 to c36fda7 Compare October 7, 2019 06:42
@neubs-bsi neubs-bsi requested a review from maxhbr October 7, 2019 06:44
@maxhbr maxhbr dismissed their stale review October 7, 2019 09:24

outdated and partially moved to other PR

Copy link
Contributor

@maxhbr maxhbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Coordinates have been removed from the sw360 data model
and purls have been added
* removed coordinates to coordinates key mapping
* added purls in sw360 release
    * represented by a map of string, string
    * keys are the antenna supported types of purl
* test have been updated
* documentation has been updated

Signed-off-by: Stephanie Neubauer <Stephanie.Neubauer@bosch-si.com>
@neubs-bsi neubs-bsi force-pushed the neubs-bsi/sw360/purl/#828 branch from c36fda7 to dc842ee Compare October 7, 2019 14:36
@neubs-bsi neubs-bsi merged commit f7dc570 into eclipse-archived:master Oct 8, 2019
@neubs-bsi neubs-bsi deleted the neubs-bsi/sw360/purl/#828 branch October 8, 2019 06:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants