Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Native Image SBOM Generation #623

Merged
merged 1 commit into from
Oct 11, 2024

Conversation

rudsberg
Copy link
Contributor

@rudsberg rudsberg commented Sep 13, 2024

Overview

This PR extends the native-maven-plugin to improve the accuracy of the Software Bill of Material (SBOM) that is generated as part of Native Image builds (only available in Oracle GraalVM).

The high-level approach is the following:

  • The native-maven-plugin invokes cyclonedx-maven-plugin to create a base SBOM.
  • The components of the base SBOM are augmented with new fields, foremost packageNames which lists the package names of that components.
  • The SBOM is stored at a known location and is handed over to Native Image.
  • Only in Native Image: Components and dependencies are pruned whenever possible as determined by the Native Image static analysis. Native Image relies on the augmented fields to map an identified type to its component.
  • Only in Native Image: The final SBOM is saved according to the arguments passed to --enable-sbom.

By using a COTS SBOM generator to get a base SBOM and passing extra information to Native Image which prunes it to make it more accurate, we at worst get an SBOM that conforms to industry standards and at best an SBOM that is significantly more accurate.

Future Work

Future work includes:

  • Adding similar support for the native-gradle-plugin.
  • Attempting to resolve the packageNames for shaded dependencies similar as is currently done for the main artifact. Currently, shaded dependencies are marked as prunable=false which instructs Native Image to not prune them or any of their transitive dependencies.

Copy link

Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA).
The following contributors of this PR have not signed the OCA:

To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application.

When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated.

If you are an Oracle employee, please make sure that you are a member of the main Oracle GitHub organization, and your membership in this organization is public.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. label Sep 13, 2024
@oracle-contributor-agreement oracle-contributor-agreement bot added OCA Verified All contributors have signed the Oracle Contributor Agreement. and removed OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. labels Sep 16, 2024
@fniephaus
Copy link
Member

@melix do you think this can be reasonably well ported to Gradle, or do you see any issues?

@rudsberg rudsberg force-pushed the feature/maven-plugin-improve-sbom branch from f55a1f8 to d0581ed Compare September 25, 2024 09:32
@melix
Copy link
Collaborator

melix commented Sep 26, 2024

So I think the biggest issue here is that we are bounding ourselves to an external plugin (cyclonedx). This could be a problem is a user is already using it, or if they are using another SBOM generator. While I'm not sure if we can work around for Maven, for Gradle this would certainly be an issue. An option would be to react on the application of a known plugin like cyclonedx (e.g, if you apply it, then we do more work), but not apply it transparently.

@rudsberg rudsberg force-pushed the feature/maven-plugin-improve-sbom branch from 6c5fdf3 to 8ef49a5 Compare September 27, 2024 07:21
@rudsberg
Copy link
Contributor Author

rudsberg commented Sep 27, 2024

Hey @melix, thank you for your comment. I am happy to discuss this further. I will do my best to answer this and give the different pros/cons of the possible approaches.

[...] or if they are using another SBOM generator

I don't see the problem with users using other SBOM generators in conjunction with our augmented Native Image based SBOM generator. Would you mind to elaborate how this would pose an issue?

To give some more context, this augmented SBOM feature is only used if the user:

  • Explicitly use the --enable-sbom build argument for Native Image, i.e. opting in to using the Native Image derived SBOM. In these cases, we automatically use this augmented approach.
  • Explicitly set the configuration parameter augmentedSBOM of the plugin to true (then we automatically add --enable-sbom to the build argument if not already used).

The user can always opt-out from this feature by setting augmentedSBOM to false.


For maven it shouldn't pose any issues even if the user application uses a different version of CycloneDX since maven allows plugins to have a different version from other plugins or the application itself. The user could use an older version of CycloneDX while we transparently use 2.8.1, for example.

Gradle, however, applies version resolution globally, so with the current approach, that would automatically force the latest version of CycloneDX to be used, thus potentially changing the version of the user's CycloneDX version. This could be an issue if CycloneDX is not backwards compatible. If it is not, and a user application depends on a specific behavior of an earlier version of CycloneDX that is not present in a later version, then that could break the user application.

An option would be to react on the application of a known plugin like cyclonedx (e.g, if you apply it, then we do more work), but not apply it transparently.

That's an alternative approach. I see the following benefits with that approach compared to the current:

  • No versioning issues in Gradle. If a user uses Gradle and already uses CycloneDX but with an earlier version, we avoid forcing them to use our newer version of CycloneDX.
  • More flexibility. Users can choose what baseline CycloneDX SBOM generator they want to use. For example, using quarkus-cyclonedx-generator instead of cyclonedx-maven-plugin.
  • More transparency. We don't transparently introduce a dependency on cyclonedx-maven-plugin and instead let the users decide if they want to have such a dependency.

However, there are downsides:

  • Harder to adopt this feature. This augmented mode would not work out-of-the-box for people already generating SBOMs with Native Image, instead users must explicitly opt-in by adding a dependency on a supported SBOM generator. We would, for example, give a hint to users like "add the cyclonedx-maven-plugin to your pom.xml to enable augmented SBOMs" (possibly also requiring them to use a specific file name so we can locate it).
  • Possibly harder to maintain. The increased flexibility of allowing users to choose their CycloneDX generator and version might be harder to maintain if there are differences between generators or if they use different CycloneDX versions. The Native Image feature assumes the CycloneDX 1.5 version and that certain fields are used. Of course, one could specify that only a specific version(s) are supported.

Here's my take on this. I am worried that adoption would suffer if we require users to themselves add a dependency on some CycloneDX generator. Instead, I see great benefits of transparently improving the SBOMs for all existing users that use the --enable-sbom build argument and making it very easy for users that start adopting SBOMs. To me it seems the only reason not to use my suggested approach would be the automatic upgrade of CycloneDX format for Gradle users using an older version of the CycloneDX plugin. However, I don't think this is a problem. This is taken from the CycloneDX website:

All CycloneDX releases are production quality standards that do not change. When drafting revisions, backwards compatibility with prior versions is of upmost priority.

I believe it's fair to assume the same guiding principle holds for their CycloneDX Gradle plugin. Even in the case for users on the new Native Image build plugin version with this augmented SBOM feature that use an older version of the Gradle CycloneDX plugin, and that gets automatically updated which possibly breaks their application, resolving such incompatibility between CycloneDX versions should be straightforward. Thus, I don't think this should hinder us from moving forward with this approach.

To increase transparency, it should be clear from the documentation that we are using the cyclonedx-{maven,gradle}-plugin under the hood for this feature. Lastly, if for some reason that this approach introduces issues, we can promptly revisit this and go for the other approach.

Disclaimer: I am far from being a senior plugin developer (this is my first plugin-development work) and I'm quite new in the GraalVM team. I look forward hearing your thoughts on this!

CC: @matneu @fniephaus.

import java.util.Set;
import java.util.function.Consumer;

final class FileWalkerUtility {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add javadoc

matneu
matneu previously approved these changes Sep 30, 2024
Copy link
Collaborator

@melix melix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rather involved PR and great work already. I think I understand the goal now, which should be clearer. In particular, in the beginning, I thought the goal was to provide information to native image so that it generates a better SBOM, but it's actually the other way around. The idea is to generate a native image, which, in turn, may help getting a better SBOM. I think we miss integration tests though, and I'd like to see this working on a real world project (e.g a Micronaut one) to make sure that it doesn't break anything.

I'm not sure that we should augment existing SBOMs, in the sense of overwriting. Maybe it's my Gradle's mindset here, but in general, we should avoid overwriting things and better generate alongside. Also, this feature is very costly (requires building native image, then scanning packages), so this should probably be documented as as such. Which also means that maybe this shouldn't be part of the default lifecycle.

Regarding a Gradle implementation: the Gradle version should make sure to isolate the cyclonedx plugin, by resolving it at runtime using the worker API. This will avoid conflicts with existing configuration. In addition, this raises the question of how SBOMs are generated as artifacts to be published. It's unclear to me where they are going to be published (Maven Repository) and probably we don't want to make publishing of regular Java artifacts dependent on something as slow as building a native image. Or maybe not, this needs discussion. In any case, I'd be in general worried that this feature makes builds slower because of implicit dependencies (e.g publishing a jar would systematically require building a native image). Maybe what we need is a design document which clarifies both the intent and process. Note that for scanning packages, since it's fairly expensive, I have implemented something similar for Gradle on the layered images branch as a shared library between Maven and Gradle, which then Gradle uses in an artifact transform which makes it much faster because cacheable.

gradle/libs.versions.toml Show resolved Hide resolved
native-maven-plugin/build.gradle.kts Outdated Show resolved Hide resolved
native-maven-plugin/build.gradle.kts Outdated Show resolved Hide resolved
}

protected boolean isOracleGraalVM() throws MojoExecutionException {
return getVersionInformation().contains(ORACLE_GRAALVM_IDENTIFIER);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: This is related to a very long time discussion we have on identifying GraalVM properly, and more specifically editions. This has changed several times in the past, this is likely not accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fniephaus Maybe you can chime in on this. We need a (hopefully reliable) way to know if the user is using the EE version of native image. Currently, the isOracleGraalVM function does this by calling native-image --version (that's the getVersionInformation() function) and checking the presence of the string "Oracle GraalVM".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has changed several times in the past, this is likely not accurate.

I guess @rudsberg only cares about recent and future versions of GraalVM, thus looking for the presence "Oracle GraalVM" in native-image --version should be fine. To know if he runs EE and thus wants this special SBOM handling is only relevant for latest GraalVM anyway since older EE versions would not have the relevant the SBOM support code in native-image anyways.

int detectedJDKVersion = NativeImageUtils.getMajorJDKVersion(getVersionInformation());
String sbomNativeImageFlag = "--enable-sbom";
boolean sbomEnabledForNativeImage = getBuildArgs().stream().anyMatch(v -> v.contains(sbomNativeImageFlag));
if (optionWasSet) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Can you clarify the intent here? Why would we have to set both --enable-sbom explicitly and augmented sbom flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting both --enable-sbom and the augmented sbom flag is not required. We check optionWasSet since the behavior is dependent on if the user explicitly set the augmented flag option. I tried to make the intent clear via the javadoc, variable names, and comments. What exactly was unclear?

@rudsberg
Copy link
Contributor Author

rudsberg commented Oct 4, 2024

@melix Thank you for your review and comments.

I think I understand the goal now, which should be clearer. In particular, in the beginning, I thought the goal was to provide information to native image so that it generates a better SBOM, but it's actually the other way around. The idea is to generate a native image, which, in turn, may help getting a better SBOM.

Yes, the idea is to provide native image with more information so that it can generate a better SBOM. This feature simply hooks into the the regular native-maven-plugin build process that creates native images by creating an SBOM that contains more information before the native image builder is invoked. This is reflected in NativeCompileNoForkMojo.execute where the call to generateAugmentedSBOMIfNeeded is made before the call to buildImage.

The bullet list of the high-level approach in the PR description sequentially describes the steps.

I think we miss integration tests though, and I'd like to see this working on a real world project (e.g a Micronaut one) to make sure that it doesn't break anything.

I can definitely add integration tests. We have integration tests already on the Native Image side, but adding it here too would be beneficial.

I'm not sure that we should augment existing SBOMs, in the sense of overwriting. Maybe it's my Gradle's mindset here, but in general, we should avoid overwriting things and better generate alongside.

The changes that are made to the baseline SBOM (the one generated by cyclonedx-maven-plugin) is

  • the pruning of components and dependencies that are not part of the image, as deduced from the static analysis.
  • the inclusion of native-image specific components that is not found by the baseline SBOM, such as svm and nativeimage.

Thus, we remove unreachable components/dependencies and augment it with components to make the SBOM reflect the true bill-of-materials as seen from the native image. In other words, it's more augmenting and deleting of unnecessary data than it is overwriting. If you see any other approach that would be better, please elaborate.

Also, this feature is very costly (requires building native image, then scanning packages), so this should probably be documented as as such. Which also means that maybe this shouldn't be part of the default lifecycle.

This should be covered by my earlier comments. The overhead of this feature is only the call to generateAugmentedSBOMIfNeeded. Executing ./mvnw package -Dpackaging=native-image (creating an image and an augmented SBOM by using --enable-sbom=export,embed as a buildArg) in the micronaut-hello-rest-maven takes around 28s on my machine (M3 Max), out of which the call to generateAugmentedSBOMIfNeeded takes between 550ms and 1300ms. This overhead is not very costly.

Regarding a Gradle implementation: the Gradle version should make sure to isolate the cyclonedx plugin, by resolving it at runtime using the worker API. This will avoid conflicts with existing configuration.

Great to hear we can isolate the cyclonedx plugin on Gradle using the worker API.

In addition, this raises the question of how SBOMs are generated as artifacts to be published. It's unclear to me where they are going to be published (Maven Repository) and probably we don't want to make publishing of regular Java artifacts dependent on something as slow as building a native image. Or maybe not, this needs discussion. In any case, I'd be in general worried that this feature makes builds slower because of implicit dependencies (e.g publishing a jar would systematically require building a native image).

I'm afraid I'm not following here. SBOMs are either embedded in the image, exported as a JSON in the build directory, or added as a resource to the classpath. Thereafter, users likely run it through some vulnerability scanner.

Maybe I'm misunderstanding your point, but I'm afraid I don't understand what you mean by publishing regular Java artifacts being dependent on building native images and how this feature causes such issues. To me that's two independent processes, and only when user opts in to building images are images built.

Note that for scanning packages, since it's fairly expensive, I have implemented something similar for Gradle on the layered images branch as a shared library between Maven and Gradle, which then Gradle uses in an artifact transform which makes it much faster because cacheable.

OK, great. I'll take a look at that.

@rudsberg
Copy link
Contributor Author

Integration tests have now been added. These tests focus on

  • error handling (using CE or non-supported EE versions), and
  • that the basic behavior is observed using the 'java-application'.

melix
melix previously approved these changes Oct 10, 2024
@rudsberg rudsberg force-pushed the feature/maven-plugin-improve-sbom branch from 7f7c2e5 to 0b57feb Compare October 11, 2024 07:14
@rudsberg rudsberg requested a review from melix October 11, 2024 08:55
@rudsberg rudsberg merged commit 4758dd2 into master Oct 11, 2024
227 checks passed
@rudsberg rudsberg deleted the feature/maven-plugin-improve-sbom branch October 11, 2024 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants