Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce database round-trips during BOM processing #1006

Merged
merged 1 commit into from
Dec 23, 2024

Conversation

nscuro
Copy link
Member

@nscuro nscuro commented Dec 21, 2024

Description

Reduces database round-trips during BOM processing.

In the previous implementation, a SELECT query was issued for every single component and service in a BOM, in order to find existing components that match their identity.

In retrospect, this causes a lot of unnecessary database round-trips and puts the database under unnecessary stress, in particular for new projects where no components and services exist yet.

Now, we query all existing components and services of the project once in bulk.

A situation where this approach can perform worse, is when a BOM is uploaded to an existing project, and the content differs wildly between BOM and project. We would then load many components into memory, only to delete them shortly after. However, this scenario should be less common. Usually, projects are either empty, or have significant overlap with the uploaded BOM.

Addressed Issue

N/A

Additional Details

Profiling the bloated BOM test, it's visible we previously spent a large chunk of CPU time waiting for Postgres to respond to identity matching queries:

image

In fact, performing these queries was more expensive than flushing new changes, despite the project initially being completely empty.

This overhead is now entirely gone:

image

Checklist

  • I have read and understand the contributing guidelines
  • This PR fixes a defect, and I have provided tests to verify that the fix is effective
  • This PR implements an enhancement, and I have provided tests to verify that it works as intended
  • This PR introduces changes to the database model, and I have updated the migration changelog accordingly
  • This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

@nscuro nscuro added the enhancement New feature or request label Dec 21, 2024
@nscuro nscuro added this to the 5.6.0 milestone Dec 21, 2024
Copy link

codacy-production bot commented Dec 21, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.14% (target: -1.00%) 80.65% (target: 70.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (bc8683a) 22292 18436 82.70%
Head commit (a78e9e6) 22249 (-43) 18432 (-4) 82.84% (+0.14%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#1006) 62 50 80.65%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

nscuro added a commit to nscuro/dependency-track that referenced this pull request Dec 21, 2024
In the previous implementation, a `SELECT` query was issued for every single component and service in a BOM, in order to find existing components that match their identity.

In retrospect, this causes a lot of unnecessary database round-trips and puts the database under unnecessary stress, in particular for new projects where no components and services exist yet.

Now, we query all existing components and services of the project once in bulk.

A situation where this approach can perform worse, is when a BOM is uploaded to an existing project, and the content differs wildly between BOM and project. We would then load many components into memory, only to delete them shortly after. However, this scenario should be less common. Usually, projects are either empty, or have significant overlap with the uploaded BOM.

Backports DependencyTrack/hyades-apiserver#1006

Signed-off-by: nscuro <nscuro@protonmail.com>
@nscuro nscuro force-pushed the bom-processing-db-rountrips branch from e78985d to 57f02fa Compare December 21, 2024 15:31
In the previous implementation, a `SELECT` query was issued for every single component and service in a BOM, in order to find existing components that match their identity.

In retrospect, this causes a lot of unnecessary database round-trips and puts the database under unnecessary stress, in particular for new projects where no components and services exist yet.

Now, we query all existing components and services of the project once in bulk.

A situation where this approach can perform worse, is when a BOM is uploaded to an existing project, and the content differs wildly between BOM and project. We would then load many components into memory, only to delete them shortly after. However, this scenario should be less common. Usually, projects are either empty, or have significant overlap with the uploaded BOM.

Signed-off-by: nscuro <nscuro@protonmail.com>
@nscuro nscuro force-pushed the bom-processing-db-rountrips branch from 57f02fa to a78e9e6 Compare December 21, 2024 17:45
nscuro added a commit to nscuro/dependency-track that referenced this pull request Dec 21, 2024
In the previous implementation, a `SELECT` query was issued for every single component and service in a BOM, in order to find existing components that match their identity.

In retrospect, this causes a lot of unnecessary database round-trips and puts the database under unnecessary stress, in particular for new projects where no components and services exist yet.

Now, we query all existing components and services of the project once in bulk.

A situation where this approach can perform worse, is when a BOM is uploaded to an existing project, and the content differs wildly between BOM and project. We would then load many components into memory, only to delete them shortly after. However, this scenario should be less common. Usually, projects are either empty, or have significant overlap with the uploaded BOM.

Backports DependencyTrack/hyades-apiserver#1006

Signed-off-by: nscuro <nscuro@protonmail.com>
@nscuro nscuro merged commit e12e9b1 into main Dec 23, 2024
9 checks passed
@nscuro nscuro deleted the bom-processing-db-rountrips branch December 23, 2024 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant