Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat cpe configurations #300

Merged
merged 29 commits into from
Mar 10, 2023
Merged

Feat cpe configurations #300

merged 29 commits into from
Mar 10, 2023

Conversation

GeorgeFI
Copy link
Collaborator

@GeorgeFI GeorgeFI commented Dec 16, 2022

Closes #252

@GeorgeFI GeorgeFI self-assigned this Dec 16, 2022
@codecov
Copy link

codecov bot commented Dec 16, 2022

Codecov Report

Patch coverage: 55.68% and project coverage change: -1.13 ⚠️

Comparison is base (c0084cf) 74.23% compared to head (21b88b0) 73.09%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #300      +/-   ##
==========================================
- Coverage   74.23%   73.09%   -1.13%     
==========================================
  Files          45       45              
  Lines        5606     5629      +23     
==========================================
- Hits         4161     4114      -47     
- Misses       1445     1515      +70     
Impacted Files Coverage Δ
src/sec_certs/utils/pandas.py 0.00% <0.00%> (ø)
src/sec_certs/sample/cve.py 51.79% <39.69%> (+2.75%) ⬆️
src/sec_certs/sample/cpe.py 91.14% <85.72%> (-1.28%) ⬇️
src/sec_certs/dataset/cve.py 61.44% <87.50%> (-31.61%) ⬇️
src/sec_certs/dataset/dataset.py 61.42% <100.00%> (-2.93%) ⬇️
src/sec_certs/sample/__init__.py 100.00% <100.00%> (ø)

... and 5 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@adamjanovsky adamjanovsky marked this pull request as ready for review December 18, 2022 10:36
Comment on lines 174 to 178
cves = self._get_cves_from_exactly_matched_cpes(cpe_matches)
cves_matched_by_configurations = self._get_cves_from_cpe_configurations(cpe_matches)
cves.update(cves_matched_by_configurations)

return cves
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return {**self._get_cves_from_exactly_matched_cpes(cpe_matches), **self._get_cves_from_cpe_configurations(cpe_matches)

Comment on lines 50 to +51
vulnerable_cpes: list[CPE]
vulnerable_cpe_configurations: list[CPEConfiguration]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this. Is there a reason that we have this as a list? Could be set as well, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no any specific reason for storing CPEConfiguration as a list. I just wanted to keep the same data structure as the CPE records have.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that makes sense. Maybe there's a reason why CPEs are a list, but I cannot recall any. Could you pls try to refactor both to sets and see what happens?

Copy link
Collaborator Author

@GeorgeFI GeorgeFI Dec 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamjanovsky Refactoring seems to be okay - at least the pipeline of downloading CVEs, processing them and building the CVEDataset is passing. Due to the usage of CVEs in pandas_columns I will also investigate the Jupyter notebooks to check if the change does not break the analysis code (e.g. usage of indices on sets etc).

Comment on lines 217 to 222
vulnerable_cpes = list(itertools.chain.from_iterable(map(lambda x: x[0], cpes_and_cpe_configurations)))
vulnerable_cpe_configurations = list(
itertools.chain.from_iterable(map(lambda x: x[1], cpes_and_cpe_configurations))
)

return vulnerable_cpes, vulnerable_cpe_configurations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return [list(t) for t in zip(*cpes_and_cpe_configurations)]

Copy link
Collaborator

@adamjanovsky adamjanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey,

thanks for your work. I'm happy that we're now on par with the main branch and matching the complex CPE configurations. Looking at the code, there are still some minor chunks to work on:

  • We could have a test (similar to test_find_related_cves() in test_cc_analysis.py that would add some artifical CVE (feel free to make up your CVE, name of certificate and cert's CPEs) and check that those complex CVEs are actually properly matched. Same test could be invented for FIPSDataset
  • We're now fairly considerative when it comes to performance. Could you please measure how much memory does the old CPE/CVE matching takes and how much does the new one takes? Same for runtimes?
  • I see that some optimizations could still be made. For instance when building look-up dicts for CVEDataset, shouldn't they only be built on CVEs that have no CPEConfiguration records? Also, self.cves_with_cpe_configurations essentially stores some CVEs that are already stored in self.cves, right? Shouldn't we delete them from self.CVEs then?
    • But we need to be careful about serialization.
  • Basically, I'm worried that we store some CVEs twice, and that we also run matching on all CVEs, and then on CVEs with CPEConfiguration records, which deteriorates the performance. This speaks into favour of having all CPEs stored in CPEConfiguration when they are part of CVEDataset.
  • Could you please doublecheck statistics in vulnerabilities.ipynb and check that they didn't change much? At least that's what I'd expected. Number of detected certs with CVEs should rise IMO, and those would be the certs that we newly match.

I know that this PR takes time and that I always require some changes. It's that I'm not happy with the result just yet. If you'd prefer, I can finish the work so that you can move to something potentiallly more interesting. Let me know in any case. Thanks!

@adamjanovsky
Copy link
Collaborator

@GeorgeFI finalized my checks and we're good for merge 🎉 , thank you for your effort 👍.

Few thinks I've adjusted:

  • I simplified parsing of nist dictionary when loading CVE dataset
  • I simplified some fixtures and tests
  • I reverted to using lists of CPEs instead of sets (complex reasons, I can describe in person if needed)
  • I reverted to using CPEs instead of CPE uris in CPEConfiguration objects
  • I discarded some auxillary CVE filtering (that you didn't introduce, but I've figured it's not helping with runtime or memory consumption).

@adamjanovsky adamjanovsky merged commit 983bc3c into main Mar 10, 2023
@adamjanovsky adamjanovsky deleted the feat-cpe-configurations branch March 10, 2023 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CVE classifier: search for matches in ANDed vulnerable configurations
2 participants