Externalize BOM ingestion pipeline #794

nscuro · 2024-07-22T21:28:27Z

Description

This PR moves the processing of uploaded BOMs from the internal, in-memory eventing system to Kafka.

This allows BOM processing to be distributed to multiple instances of the API server. Before this PR, instances that received and upload request were also the ones processing the BOM, which could lead to very uneven load distributions.

To allow for processing to be shared, uploaded BOMs need to be stored in a location that all instances can access. At the moment, BOMs are written to the /tmp directory which clearly doesn't work for the goal at hand.

This PR provides three new storage extensions for this purpose:

Local filesystem (could also be NAS)
Database (new BOM_UPLOAD table)
S3

The default being Database, since it requires no additional setup. Database storage is expected to be not a good fit for large deployments with very frequent uploads.

Note

With the work done in PR #805, the idea is to empower users to plug in their own, potentially proprietary, storage solutions.

To reduce the volume of data being transmitted to and from storage, as well as reduce the storage size requirements, we compress BOMs using zstd. Currently, the compression level is hardcoded to 3 (22 is the maximum), but the plan is to make this configurable.

BOMs are stored after successful validation, and deleted again after successful processing. The storage being used is thus only temporary and will not replace the eventual adoption of the CycloneDX Transparency Exchange API.

Per default, each instanc processes at most alpine.kafka.processor.bom.upload.max.concurrency=-1 BOMs in parallel (-1 meaning match number of topic partitions). Because the concurrency is key-based, it can be increased beyond the number of partitions. The maximum parallelism is bound by the number of unique projects BOMs are uploaded to.

Addressed Issue

Closes DependencyTrack/hyades#633

Additional Details

sequenceDiagram
    Client->>+API Server: Upload BOM
    API Server->>API Server: Validate BOM
    API Server->>API Server: Generate correlation token (UUID)
    API Server->>API Server: Compress BOM (zstd)
    API Server->>Storage: Upload compressed BOM
    Note over API Server, Storage: Keyed by correlation token
    API Server->>Kafka: Publish event to dtrack.event.bom-uploaded topic
    Note over API Server, Kafka: Key=Project UUID<br/>Value=org.dependencytrack.event.v1alpha1.BomUploadedEvent proto
    API Server->>Client: Return correlation token
    loop continuously
        API Server->>Kafka: Consume from dtrack.event.bom-uploaded topic
        loop for each event
            API Server->>Storage: Get compressed BOM by correlation token
            API Server->>API Server: Decompress BOM
            API Server->>API Server: Process BOM
            alt processing failed
                API Server->>API Server: Update status of upload in DB to "failed"
                API Server->>Kafka: Publish event to "BOM Processing failed" topic
            else processing succeeded
                API Server->>Storage: Delete BOM by correlation token
                API Server->>API Server: Update status of upload in DB to "successful"
                API Server->>Kafka: Publish event to "BOM Processed" topic
                API Server->>API Server: Trigger vuln analysis etc.
            end
        end
    end

Checklist

I have read and understand the contributing guidelines
~~This PR fixes a defect, and I have provided tests to verify that the fix is effective~~
This PR implements an enhancement, and I have provided tests to verify that it works as intended
This PR introduces changes to the database model, and I have updated the migration changelog accordingly
~~This PR introduces new or alters existing behavior, and I have updated the documentation accordingly~~

codacy-production · 2024-07-24T20:56:58Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ +0.00% (target: -1.00%)	✅ 84.93% (target: 70.00%)

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`8b43f77`)	21700	17882	82.41%
Head commit (`caef8f5`)	21895 (+195)	18042 (+160)	82.40% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#794)	272	231	84.93%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more}

hoggmania · 2024-08-05T12:11:58Z

Look's good to go to me.

Signed-off-by: nscuro <nscuro@protonmail.com>

nscuro added the enhancement New feature or request label Jul 22, 2024

nscuro force-pushed the issue-633 branch 6 times, most recently from 1bfd177 to 81206ad Compare July 24, 2024 20:28

nscuro mentioned this pull request Jul 24, 2024

Add config options for BOM upload storage DependencyTrack/hyades-frontend#101

Closed

1 task

nscuro added this to the 5.6.0 milestone Jul 24, 2024

nscuro force-pushed the issue-633 branch 10 times, most recently from 2577957 to 111bb27 Compare July 27, 2024 19:13

nscuro mentioned this pull request Jul 27, 2024

Introduce plugin system to deal with provider config and lifecycle #805

Merged

2 tasks

nscuro force-pushed the issue-633 branch 9 times, most recently from 14b0484 to c65b55d Compare July 29, 2024 22:04

nscuro force-pushed the issue-633 branch 3 times, most recently from 0b7c377 to 5babe60 Compare August 5, 2024 11:01

This was referenced Aug 5, 2024

Add e2e test for BOM upload storage DependencyTrack/hyades#1432

Open

Merge cleanup tasks into a single Housekeeping task #829

Closed

Introduce maintenance tasks; Unify cron and lock duration configs for tasks #840

Merged

nscuro modified the milestones: 5.6.0, 5.7.0 Aug 22, 2024

nscuro force-pushed the issue-633 branch 3 times, most recently from 11d073d to 04d9efc Compare September 10, 2024 17:11

nscuro force-pushed the issue-633 branch 3 times, most recently from 4566ad2 to caef8f5 Compare September 20, 2024 12:43

nscuro added 5 commits September 24, 2024 18:38

Externalize BOM ingestion pipeline

94e1a82

Signed-off-by: nscuro <nscuro@protonmail.com>

Implement BOM upload storage plugin

06eaa6b

Signed-off-by: nscuro <nscuro@protonmail.com>

Make BOM upload storage compression level configurable

ec94e83

Signed-off-by: nscuro <nscuro@protonmail.com>

Add maintenance task to enforce BOM upload retention

83b65f8

Signed-off-by: nscuro <nscuro@protonmail.com>

Fix merge conflict artifacts

e52a115

Signed-off-by: nscuro <nscuro@protonmail.com>

nscuro force-pushed the issue-633 branch from caef8f5 to e52a115 Compare September 24, 2024 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Externalize BOM ingestion pipeline #794

Externalize BOM ingestion pipeline #794

nscuro commented Jul 22, 2024 •

edited

Loading

codacy-production bot commented Jul 24, 2024 •

edited

Loading

hoggmania commented Aug 5, 2024

Externalize BOM ingestion pipeline #794

Are you sure you want to change the base?

Externalize BOM ingestion pipeline #794

Conversation

nscuro commented Jul 22, 2024 • edited Loading

Description

Addressed Issue

Additional Details

Checklist

codacy-production bot commented Jul 24, 2024 • edited Loading

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

hoggmania commented Aug 5, 2024

nscuro commented Jul 22, 2024 •

edited

Loading

codacy-production bot commented Jul 24, 2024 •

edited

Loading