Implement the hackage and nixpkgs meta analyzers #3549

MangoIV · 2024-03-13T17:12:39Z

Description

This solves part 2 of #3545

Addressed Issue

Additional Details

this depends on #3546

the apache http client utils version 4 don't implement brotli decompression; I redid some of the common http client code to keep this scoped; a future refactoring could remove the extraneous code.

Checklist

I have read and understand the contributing guidelines
This PR implements an enhancement, and I have provided tests to verify that it works as intended
This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

src/main/java/org/dependencytrack/tasks/repositories/NixpkgsMetaAnalyzer.java

MangoIV · 2024-03-13T17:21:46Z

what's missing:

invoke the refreshing job every now and then
make Brotli decompression not error
the usual chores.

different than the older http client Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

MangoIV · 2024-03-18T10:54:48Z

so if my theory is true, then this is some truly cursed shit:

at runtime, the http client library looks for presence of the brotli dec class
if it doesn't exist, it doesn't try to decode brotli encoded files but still removes the decoding header and goes on
if it does exist, it actually converts it into a decompressing entity and then goes on decoding it

Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

MangoIV · 2024-03-25T10:04:02Z

Hi, I would like some input on where to best invoke the update of the nixpkgs meta analyzer singleton. Thanks in advance!

emil-wire · 2024-03-25T11:14:19Z

@nscuro I don't want to pester you and understand you have a lot on your mind. We'd be grateful if you could provide some guidance at your earliest convenience to make this PR a success and land this feature in DT. Thank you!

nscuro · 2024-03-25T11:17:06Z

@MangoIV @emil-wire Hey, will have a look at this ASAP and provide feedback. Thank you for your contributions! :)

nscuro · 2024-03-25T20:41:17Z

@MangoIV Instead of using a singleton of NixpkgsMetaAnalyzer, and updating it every now and then, you could use a read-through cache with "expire-after-write" eviction. The Caffeine cache library is already available in DT.

So the first time a task tries to access nix versions, it will fetch and parse packages.json, and put it in the cache. Other tasks trying to access it are blocked until the cache population finishes. After that, they can be quickly served from said cache. The cache will expire after a given interval (e.g. 1h), and the whole thing repeats.

The benefit of this is that you won't have to deal with concurrency controls such as locking on your own. Also refreshing happens implicitly and as such doesn't require a dedicated scheduled task.

Because we'd want to refresh the entire package index at once, the entire index would be a single cache entry:

class NixPkgMetaAnalyzer {

    private static final Cache<String, Map<String, String>> CACHE = Caffeine.newBuilder()
            .expireAfterWrite(60, TimeUnit.MINUTES)
            .maximumSize(1)
            .build();

    MetaModel analyze(Component component) {
        Map<String, String> latestVersions = CACHE.get("nixpkgs", cacheKey -> {
            final var versions = new HashMap<String, String>();
            // TODO: Download and parse package index
            return versions;
        });

        // TODO: Do things with latestVersions
    }

More details here: https://github.com/ben-manes/caffeine/wiki/Population#manual

MangoIV · 2024-03-25T21:01:20Z

Thank you for the answer, that looks very nice!

It looks like a fairly large hammer for the tiny nail in trying to get into the wall but if it’s already used across the codebase, this looks like a feasible option; I’ll adjust accordingly.

…xpkgs json Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

MangoIV · 2024-04-08T06:55:42Z

package-url/packageurl-java@73b2f51

btw; this got merged, i.e. theoretically I could do this minor cleanup; however upstream didn't do a release yet so idk if this is worth the effort.

MangoIV · 2024-04-08T06:56:32Z

new tests pass locally, let's see if CI is merciful today 🥺

Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

codacy-production · 2024-04-09T12:20:02Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ +0.43% (target: -1.00%)	✅ 72.22% (target: 70.00%)

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`f30bef6`)	21514	15924	74.02%
Head commit (`2e71242`)	21739 (+225)	16185 (+261)	74.45% (+0.43%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#3549)	90	65	72.22%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation}

MangoIV · 2024-04-12T11:24:14Z

@nscuro is this how you'd imagine it? what can I do to make this PR progress?

nscuro

Thanks @MangoIV!

As a future enhancement, we'd want to improve the parsing of the nixpkgs catalogue. At the moment, the entire thing is deserialized at once, which causes quite a large amount of memory allocations:

A more efficient way would be to leverage a streaming API such as the one provided by Jackson. That allows us to only deserialize fields we actually care about. We make use of that when parsing CVE data from the NVD already:

dependency-track/src/main/java/org/dependencytrack/parser/nvd/NvdParser.java

Lines 82 to 107 in 3e8f93f

 try (final InputStream in = Files.newInputStream(file.toPath()); 

 final JsonParser jsonParser = objectMapper.createParser(in)) { 

 jsonParser.nextToken(); // Position cursor at first token 

 // Due to JSON feeds being rather large, do not parse them completely, 

 // but "stream" through them. Parsing individual CVE items 

 // one-by-one allows for garbage collection to kick in sooner, 

 // keeping the overall memory footprint low. 

 JsonToken currentToken; 

 while (jsonParser.nextToken() != JsonToken.END_OBJECT) { 

 final String fieldName = jsonParser.getCurrentName(); 

 currentToken = jsonParser.nextToken(); 

 if ("CVE_Items".equals(fieldName)) { 

 if (currentToken == JsonToken.START_ARRAY) { 

 while (jsonParser.nextToken() != JsonToken.END_ARRAY) { 

 final ObjectNode cveItem = jsonParser.readValueAsTree(); 

 parseCveItem(cveItem); 

 } 

 } else { 

 jsonParser.skipChildren(); 

 } 

 } else { 

 jsonParser.skipChildren(); 

 } 

 } 

 } catch (Exception e) {

MangoIV · 2024-04-13T12:37:23Z

It’s a tradeoff. I would be reluctant to make a guess on whether it’s wise to use streaming here without thorough benchmarks.

This was missed in DependencyTrack#3549 Signed-off-by: nscuro <nscuro@protonmail.com>

MangoIV added 4 commits March 12, 2024 18:57

[feat] add the hackage meta analyzer

551e632

Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

[chore] add hackage meta analyzer test

ba13d15

Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

[fix] actually run the meta analyzer when applicable

6f07185

Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

[wip] start implementing Nixpkgs Meta Analyzer

5b51681

Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

MangoIV commented Mar 13, 2024

View reviewed changes

src/main/java/org/dependencytrack/tasks/repositories/NixpkgsMetaAnalyzer.java Outdated Show resolved Hide resolved

[feat] remove brotli stuff; for some reason http client 5 does something

c5096b2

different than the older http client Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

MangoIV force-pushed the mangoiv/nixpkgs-meta-analyzer branch from bf2c25b to c5096b2 Compare March 14, 2024 12:09

MangoIV added 2 commits March 14, 2024 14:25

[chore] add header lines

1e00857

Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

[fix] fix test

965fbc3

Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

MangoIV force-pushed the mangoiv/nixpkgs-meta-analyzer branch from e96397d to 965fbc3 Compare March 18, 2024 10:53

MangoIV mentioned this pull request Mar 18, 2024

[feat] add the hackage meta analyzer #3546

Closed

2 tasks

MangoIV changed the title ~~[wip] implement the nixpkgs meta analyzer~~ [wip] implement the hackage and nixpkgs meta analyzers Mar 18, 2024

[chore] reformat

93573e0

Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

MangoIV force-pushed the mangoiv/nixpkgs-meta-analyzer branch from a5813d9 to 93573e0 Compare March 18, 2024 11:00

[fix] use Caffeine for caching instead of singleton for caching of ni…

6246b3d

…xpkgs json Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

MangoIV marked this pull request as ready for review April 8, 2024 07:28

[chore] update license headers to license to OWAS foundation

2e71242

Signed-off-by: Magnus Viernickel <magnus.viernickel@wire.com>

nscuro added this to the 4.11 milestone Apr 13, 2024

nscuro added the enhancement New feature or request label Apr 13, 2024

nscuro approved these changes Apr 13, 2024

View reviewed changes

nscuro changed the title ~~[wip] implement the hackage and nixpkgs meta analyzers~~ Implement the hackage and nixpkgs meta analyzers Apr 13, 2024

nscuro merged commit b40ea44 into DependencyTrack:master Apr 13, 2024
12 checks passed

nscuro added a commit to nscuro/dependency-track that referenced this pull request May 6, 2024

Fix missing default repos for Hackage and Nixpkgs

7de43e6

This was missed in DependencyTrack#3549 Signed-off-by: nscuro <nscuro@protonmail.com>

nscuro added a commit to nscuro/dependency-track that referenced this pull request May 6, 2024

Fix missing default repos for Hackage and Nixpkgs

0b6da3a

This was missed in DependencyTrack#3549 Signed-off-by: nscuro <nscuro@protonmail.com>

nscuro mentioned this pull request May 6, 2024

Fix missing default repos for Hackage and Nixpkgs #3678

Merged

2 tasks

nscuro added a commit to nscuro/dependency-track that referenced this pull request May 6, 2024

Fix missing default repos for Hackage and Nixpkgs

8e49d25

This was missed in DependencyTrack#3549 Signed-off-by: nscuro <nscuro@protonmail.com>

github-actions bot locked as resolved and limited conversation to collaborators May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the hackage and nixpkgs meta analyzers #3549

Implement the hackage and nixpkgs meta analyzers #3549

MangoIV commented Mar 13, 2024 •

edited

Loading

MangoIV commented Mar 13, 2024 •

edited

Loading

MangoIV commented Mar 18, 2024

MangoIV commented Mar 25, 2024

emil-wire commented Mar 25, 2024

nscuro commented Mar 25, 2024

nscuro commented Mar 25, 2024

MangoIV commented Mar 25, 2024

MangoIV commented Apr 8, 2024

MangoIV commented Apr 8, 2024 •

edited

Loading

codacy-production bot commented Apr 9, 2024

MangoIV commented Apr 12, 2024

nscuro left a comment

MangoIV commented Apr 13, 2024

	try (final InputStream in = Files.newInputStream(file.toPath());
	final JsonParser jsonParser = objectMapper.createParser(in)) {
	jsonParser.nextToken(); // Position cursor at first token

	// Due to JSON feeds being rather large, do not parse them completely,
	// but "stream" through them. Parsing individual CVE items
	// one-by-one allows for garbage collection to kick in sooner,
	// keeping the overall memory footprint low.
	JsonToken currentToken;
	while (jsonParser.nextToken() != JsonToken.END_OBJECT) {
	final String fieldName = jsonParser.getCurrentName();
	currentToken = jsonParser.nextToken();
	if ("CVE_Items".equals(fieldName)) {
	if (currentToken == JsonToken.START_ARRAY) {
	while (jsonParser.nextToken() != JsonToken.END_ARRAY) {
	final ObjectNode cveItem = jsonParser.readValueAsTree();
	parseCveItem(cveItem);
	}
	} else {
	jsonParser.skipChildren();
	}
	} else {
	jsonParser.skipChildren();
	}
	}
	} catch (Exception e) {

Implement the hackage and nixpkgs meta analyzers #3549

Implement the hackage and nixpkgs meta analyzers #3549

Conversation

MangoIV commented Mar 13, 2024 • edited Loading

Description

Addressed Issue

Additional Details

Checklist

MangoIV commented Mar 13, 2024 • edited Loading

MangoIV commented Mar 18, 2024

MangoIV commented Mar 25, 2024

emil-wire commented Mar 25, 2024

nscuro commented Mar 25, 2024

nscuro commented Mar 25, 2024

MangoIV commented Mar 25, 2024

MangoIV commented Apr 8, 2024

MangoIV commented Apr 8, 2024 • edited Loading

codacy-production bot commented Apr 9, 2024

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

MangoIV commented Apr 12, 2024

nscuro left a comment

Choose a reason for hiding this comment

MangoIV commented Apr 13, 2024

MangoIV commented Mar 13, 2024 •

edited

Loading

MangoIV commented Mar 13, 2024 •

edited

Loading

MangoIV commented Apr 8, 2024 •

edited

Loading