-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CVE Auditing with OVAL #80
Conversation
Abid asked me to review. to me it looks good. Note that we also express "not affectedness" via a version = 0 relation, but this should not matter. (And also have a livepatching logic trick you will probably see in SUSE and Red Hat OVAL.) Only big concern from my side is the question of scale. I think with intelligent parsing and data storage it might work. |
Hi @msmeissn, Thank you for the review. When you say the live patching trick, are you referring to this comment in OVAL files? <!-- For kernel default affectedness we evaluate affectedness for the lowest kernel without livepatch for the issue, and for each livepatched kernel version we check if no livepatch is installed with the right version or higher
OR(
AND(affected kernel1, NONE SATISFY(livepatch for kernel1 >= fixed livepatch1 version installed)),
AND(affected kernel2, NONE SATISFY(livepatch for kernel2 >= fixed livepatch2 version installed)),
kernel <= version before first livepatch provided
) --> Currently, I'm ignoring it when trying to collect vulnerable packages because it breaks the pattern where you have one Criteria node, where the first child holds the vulnerable platforms and the second, holds the vulnerable packages. I think Uyuni does the same. Below is a code snippet from Uyuni, CVEAuditiManager#listSystemsByPatchStatus.
Exactly, if we store elements as soon as they are parsed it shouldn't be a problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Houssem Nasri, Thank you for the great RFC.
It looks good but I have some notes that can make it more focused and informative.
In the UYUNI Backend is mentioning 3 modules, that I think are to detail in some parts but doesn't cover all the steps we need to archive the CVE analyzes goal (more at the end of this comment).
Also we should just put the API call, not specifying if is JSON or XMLRPC, since both will be available.
I would not put all the methods for the DAO in this RFC. I would consider that as a implementation detail.
For the database I think we need to decompose the metadata into database tables to help in the needed features, like search for CVE number, of list CVE for platform. A database diagram would help.
I'm not sure if I fully agree with the new names for the "PatchStatus".
I would put something like:
- Add the "AFFECTED_PATCH_UNAVAILABLE" as you made.
- Keep "AFFECTED_PATCH_INAPPLICABLE", this mean a full or partial patch is available in a different channel. I think keeping the name will help existing users (but I don't have a strong opinion on it)
- "AFFECTED_FULL_PATCH_APPLICABLE" similar, but name closer to what we have now
- "AFFECTED_PARTIAL_PATCH_APPLICABLE", meaning the path is applicable but will not solve all the issues.
Proposed Modules
Oval data processing Module
Responsible for Download oval files, parse them and insert the structure data in the database base.
For this module we should see the proposed schema and how the fields will map.
The goal for it is to transform the data from XML to postgresql database to consumed by the other modules.
For OVAL data you can also have a look in the Neuvector information, and from where they are getting the oval files: https://open-docs.neuvector.com/scanning/cve_sources#cve-database-sources
We can then discuss if we should have a schedule to periodically update this data.
System CVE analyzes
Run a CVE analyze for a system (or set of systems).
It will use the information we have for the system and the OVAL data inserted in the database. This will result in which CVE's are affecting the system, and the status of each CVE.
This result should be saved in the database to be consumed later. We expect that this to be an expensive task so we need to avoid run it to often and save the results in the database.
For this you can use the diagram you presented for the verification process and CVE status in the system.
I also would like to see some more details on how we plan to run the criteria scan on a system.
API access
This would expose API methods (Json and XMLRPC) to access data and perform actions.
Some examples can be:
- Get System CVE
- Schedule CVE scan for system(s)
- Get CVE's for platform (OS)
- etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Houssem for your effort so far. It's really looking good already. One thing I'd like a bit more detail on is the schema part. It's really crucial that we design it with performance in mind. Would you mind explaining a bit more why you designed it the way you did so we can understand a bit better what your thought process was when creating it.
@parlt91 @rjmateus Thank you for the review. Below, is a list of tasks to be done on the RFC based on your comments, and comments from @admd that were shared in meetings. Please let me know if it's missing something. To be done
|
|
||
Once the OVAL file is parsed, the next step is to store it in a database. A relational database, such as PostgreSQL, seems like an obvious choice given that it is the default database used by Uyuni. | ||
|
||
In Uyuni, multiple database-access patterns are being used. Skimming over the codebase, it appears that the newer parts of the code is using JPA Criteria API and Annotation-based Named queries. For that reason, the same patterns will be used to query OVAL data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About how many objects stored in DB are we talking? I assume this is a lot.
Hibernate is bad when it comes to big tables with a lot data and combining them.
We did that mistake already too often in the other tables.
So it maybe a good idea to use DTO objects and SQL instead of hibernate objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The writing of the data below took around 10 minutes with Hibernate
- Definitions: 13660 (The writing of definitions will result in the writing of affected products, references, and mentioned cves)
- Objects: 9657
- States: 19036
- Tests: 67547
Weirdly enough, using native queries took even more. I think it's because of JPA's auto-flushing strategy which prevents running native queries in bulk. Right now, I'm out of ideas to optimize it further. I'll try not to rely on Hibernate anymore and use plain SQL as you suggested. You can look at https://github.com/HoussemNasri/uyuni/blob/oval-db/java/code/src/com/suse/oval/OVALCachingFactory.java for more details about implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip: switching between hibernate and native queries is expensive. On every switch, hibernate need to verify the cache if a flush is required. Alone this check cost most of the time.
I think writing is only one aspect. It will happen during the night somewhen. If it takes some minutes more does not matter.
More imporant is reading. How long do the queries take to display data. This is where people "click" and wait for the result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading affected products from the database is one aspect of the algorithm and the fastest step I would say. There is also the evaluation of the criteria tree, the extraction of vulnerable packages from the criteria tree, and the reading of patch information from the database/channels.
Right now, I have an early draft implementation of the algorithm running and it takes between 0.2 and 14 seconds to evaluate a single CVE. It's not optimized though. With some caching, we can get those numbers way below. Unfortunately, showing all systems with all affected CVE numbers might not be feasible (it would take too much time) even with optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: It turns out that the whole process of auditing one system against one CVE takes less than a second, but it grows over time (probably Hibernate dirty checking) when checking multiple CVEs, hence why the 14 seconds.
|
||
The OVAL Downloader is responsible for finding OVAL documents online, downloading them, and caching them for easy access. However, since the relevant OVAL data is saved in PostgreSQL, it is possible for the downloader to skip caching the OVAL files or remove them once they are stored in the database. This approach would reduce the amount of storage needed. | ||
|
||
OVAL data comes from several sources – OvalRepo, OpenSUSE OVAL repository and other Linux distributions repositories – which means that the OVAL Downloader needs to be robust in the presence of errors. It is essential also that the component is designed with comprehensive error messaging to provide instant alerts in case an OVAL source is moved or the server is down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People wants things going automatically. We need to run the downloader recurring at least once per day and update all the information in the database.
We also need a place/way to configure which OVAL to download. Best would be to attache this to the products.
When a customer choose a product to mirror, we automatically download the OVAL.
|
||
- How to distinguish QA from released patches in non-openSUSE distributions? | ||
- Can we depend on OVALRepo to provide us with OVAL definitions? | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we get the querying of data fast enough. First for the existing case, searching for affected systems of 1 CVE number.
A second case was requested since a long time already. Show all systems with all affected CVE numbers.
Currently this data set it to big to query and present to the user. Maybe this can be done with this style.
During implementation I would try as early as possible to try out the performance of the algorithms.
- Also updated the unresolved questions section to have more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side this looks fine now. I just left one remark (see below). Thank you for your effort so far :)
and `rhnPackageEVR`. One possible optimization would be to create foreign keys for `name` and `fix_version` | ||
inside `suseOVALVulnerablePacakge` instead of hard coding their values. While the optimization results in | ||
reduced redundancy and increased storage efficiency, it does introduce potential performance implications. | ||
An experiment should be made to decide whether the optimization is worth it or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you do any testing around what performance impact we could expect? Personally I'd prefer not to have redundant data and link to the rhnPackage
table, provided it's not to big of an impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not yet. Right now, it's not clear how I should go about testing the performance impact. I could run the CVE auditing query before and after the optimization and measure the impact in seconds, but I only have 4 clients registered. The impact might be exponentially bigger if 100s of clients were registered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can easily add thousands of systems by copying one of yours with this script: https://github.com/uyuni-project/uyuni/blob/master/susemanager-utils/performance/copy-system.py. To test the performance impact I think you would also need to have many channels and packages in your DB. Again a small script adding the entries in the DB would be enough... but we don't have that one yet 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that seems like what I've been looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Just one small question that should not block the RFC to be merged.
Use OVAL data in addition to channels to conduct more accurate CVE audits.
See the rendered version.