Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow for SBOMs with a large number of files + relationships #790

Open
jonjohnsonjr opened this issue Jan 25, 2024 · 1 comment · May be fixed by #792
Open

Slow for SBOMs with a large number of files + relationships #790

jonjohnsonjr opened this issue Jan 25, 2024 · 1 comment · May be fixed by #792

Comments

@jonjohnsonjr
Copy link

jonjohnsonjr commented Jan 25, 2024

This check iterates over the list existing_relationships up to 2 times:

def check_if_relationship_exists(
self, relationship: Relationship, existing_relationships: List[Relationship]
) -> bool:
# assume existing relationships are stripped of comments
if relationship in existing_relationships:
return True
relationship_inverted: Relationship = self.invert_relationship(relationship)
if relationship_inverted in existing_relationships:
return True
return False

And it's called for every file:

for file_spdx_id in contained_files:
try:
contains_relationship = Relationship(
spdx_element_id=package_spdx_id,
relationship_type=RelationshipType.CONTAINS,
related_spdx_element_id=file_spdx_id,
)
except ConstructorTypeErrors as err:
logger.append(err.get_messages())
continue
if not self.check_if_relationship_exists(
relationship=contains_relationship, existing_relationships=existing_relationships
):
contains_relationships.append(contains_relationship)

So if F is the number of files and R is the number of relationships, we are doing O(F * R) comparisons of relationships (which seem not cheap individually).

And given that this seems to expect to find a relationship for every file, we know that R is >= F, so this is at least O(F^2).

Looks quadratic to me.

With the caveat that I'm not a python expert, this being a list seems to be the issue:

existing_relationships_without_comments: List[Relationship] = self.get_all_relationships_without_comments(
relationships
)

Is it possible to turn that into a set so these membership tests are O(1) instead of O(N)?

@jonjohnsonjr
Copy link
Author

As an example:

$ curl -L https://packages.wolfi.dev/os/aarch64/renovate-37.151.0-r0.apk | tar -Oxz var/lib/db/sbom/renovate-37.151.0-r0.spdx.json > sbom.spdx.json

$ wc -c sbom.spdx.json
 36350622 sbom.spdx.json

$ cat sbom.spdx.json | jq '.files[]' -c | wc -l
   30193

$ cat sbom.spdx.json | jq '.relationships[]' -c | wc -l
   30193

# 30193 * 30193 = 911617249, which is a pretty big number for python

$ time ntia-checker -v --file sbom.spdx.json

Is this SBOM NTIA minimum element conformant? True

Individual elements                            | Status
-------------------------------------------------------
All component names provided?                  | True
All component versions provided?               | True
All component identifiers provided?            | True
All component suppliers provided?              | True
SBOM author name provided?                     | True
SBOM creation timestamp provided?              | True
Dependency relationships provided?             | True

No components with missing information.

real	8m57.222s
user	8m57.118s
sys	0m0.085s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant