-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat cpe configurations #300
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #300 +/- ##
==========================================
- Coverage 74.23% 73.09% -1.13%
==========================================
Files 45 45
Lines 5606 5629 +23
==========================================
- Hits 4161 4114 -47
- Misses 1445 1515 +70
... and 5 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
src/sec_certs/dataset/cve.py
Outdated
cves = self._get_cves_from_exactly_matched_cpes(cpe_matches) | ||
cves_matched_by_configurations = self._get_cves_from_cpe_configurations(cpe_matches) | ||
cves.update(cves_matched_by_configurations) | ||
|
||
return cves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return {**self._get_cves_from_exactly_matched_cpes(cpe_matches), **self._get_cves_from_cpe_configurations(cpe_matches)
vulnerable_cpes: list[CPE] | ||
vulnerable_cpe_configurations: list[CPEConfiguration] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed this. Is there a reason that we have this as a list? Could be set as well, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no any specific reason for storing CPEConfiguration
as a list. I just wanted to keep the same data structure as the CPE
records have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that makes sense. Maybe there's a reason why CPEs are a list, but I cannot recall any. Could you pls try to refactor both to sets and see what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamjanovsky Refactoring seems to be okay - at least the pipeline of downloading CVEs, processing them and building the CVEDataset
is passing. Due to the usage of CVEs
in pandas_columns
I will also investigate the Jupyter notebooks to check if the change does not break the analysis code (e.g. usage of indices on sets etc).
src/sec_certs/sample/cve.py
Outdated
vulnerable_cpes = list(itertools.chain.from_iterable(map(lambda x: x[0], cpes_and_cpe_configurations))) | ||
vulnerable_cpe_configurations = list( | ||
itertools.chain.from_iterable(map(lambda x: x[1], cpes_and_cpe_configurations)) | ||
) | ||
|
||
return vulnerable_cpes, vulnerable_cpe_configurations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return [list(t) for t in zip(*cpes_and_cpe_configurations)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey,
thanks for your work. I'm happy that we're now on par with the main branch and matching the complex CPE configurations. Looking at the code, there are still some minor chunks to work on:
- We could have a test (similar to
test_find_related_cves()
in test_cc_analysis.py that would add some artifical CVE (feel free to make up your CVE, name of certificate and cert's CPEs) and check that those complex CVEs are actually properly matched. Same test could be invented for FIPSDataset - We're now fairly considerative when it comes to performance. Could you please measure how much memory does the old CPE/CVE matching takes and how much does the new one takes? Same for runtimes?
- I see that some optimizations could still be made. For instance when building look-up dicts for CVEDataset, shouldn't they only be built on CVEs that have no
CPEConfiguration
records? Also, self.cves_with_cpe_configurations essentially stores some CVEs that are already stored inself.cves
, right? Shouldn't we delete them fromself.CVEs
then?- But we need to be careful about serialization.
- Basically, I'm worried that we store some CVEs twice, and that we also run matching on all CVEs, and then on CVEs with
CPEConfiguration
records, which deteriorates the performance. This speaks into favour of having all CPEs stored inCPEConfiguration
when they are part ofCVEDataset
. - Could you please doublecheck statistics in
vulnerabilities.ipynb
and check that they didn't change much? At least that's what I'd expected. Number of detected certs with CVEs should rise IMO, and those would be the certs that we newly match.
I know that this PR takes time and that I always require some changes. It's that I'm not happy with the result just yet. If you'd prefer, I can finish the work so that you can move to something potentiallly more interesting. Let me know in any case. Thanks!
@GeorgeFI finalized my checks and we're good for merge 🎉 , thank you for your effort 👍. Few thinks I've adjusted:
|
Closes #252