Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CVE Auditing with OVAL #80

Merged
merged 45 commits into from
Jan 8, 2024
Merged

CVE Auditing with OVAL #80

merged 45 commits into from
Jan 8, 2024

Conversation

HoussemNasri
Copy link
Contributor

@HoussemNasri HoussemNasri commented Jun 13, 2023

Use OVAL data in addition to channels to conduct more accurate CVE audits.

See the rendered version.

@HoussemNasri HoussemNasri marked this pull request as ready for review June 15, 2023 21:22
@msmeissn
Copy link

Abid asked me to review.

to me it looks good.

Note that we also express "not affectedness" via a version = 0 relation, but this should not matter. (And also have a livepatching logic trick you will probably see in SUSE and Red Hat OVAL.)

Only big concern from my side is the question of scale.
With "openscap" we see the regular SUSE OVAL file taking multiple GIGABYTE of memory to process.

I think with intelligent parsing and data storage it might work.

@HoussemNasri
Copy link
Contributor Author

Hi @msmeissn,

Thank you for the review.

When you say the live patching trick, are you referring to this comment in OVAL files?

<!-- For kernel default affectedness we evaluate affectedness for the lowest kernel without livepatch for the issue, and for each livepatched kernel version we check if no livepatch is installed with the right version or higher
OR(
	AND(affected kernel1, NONE SATISFY(livepatch for kernel1 >= fixed livepatch1 version installed)),
	AND(affected kernel2, NONE SATISFY(livepatch for kernel2 >= fixed livepatch2 version installed)),
	kernel <= version before first livepatch provided
) -->

Currently, I'm ignoring it when trying to collect vulnerable packages because it breaks the pattern where you have one Criteria node, where the first child holds the vulnerable platforms and the second, holds the vulnerable packages. I think Uyuni does the same. Below is a code snippet from Uyuni, CVEAuditiManager#listSystemsByPatchStatus.

image

Only big concern from my side is the question of scale.
With "openscap" we see the regular SUSE OVAL file taking multiple GIGABYTE of memory to process.
I think with intelligent parsing and data storage it might work.

Exactly, if we store elements as soon as they are parsed it shouldn't be a problem.

Copy link
Member

@rjmateus rjmateus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Houssem Nasri, Thank you for the great RFC.
It looks good but I have some notes that can make it more focused and informative.

In the UYUNI Backend is mentioning 3 modules, that I think are to detail in some parts but doesn't cover all the steps we need to archive the CVE analyzes goal (more at the end of this comment).
Also we should just put the API call, not specifying if is JSON or XMLRPC, since both will be available.

I would not put all the methods for the DAO in this RFC. I would consider that as a implementation detail.

For the database I think we need to decompose the metadata into database tables to help in the needed features, like search for CVE number, of list CVE for platform. A database diagram would help.

I'm not sure if I fully agree with the new names for the "PatchStatus".
I would put something like:

  • Add the "AFFECTED_PATCH_UNAVAILABLE" as you made.
  • Keep "AFFECTED_PATCH_INAPPLICABLE", this mean a full or partial patch is available in a different channel. I think keeping the name will help existing users (but I don't have a strong opinion on it)
  • "AFFECTED_FULL_PATCH_APPLICABLE" similar, but name closer to what we have now
  • "AFFECTED_PARTIAL_PATCH_APPLICABLE", meaning the path is applicable but will not solve all the issues.

Proposed Modules

Oval data processing Module

Responsible for Download oval files, parse them and insert the structure data in the database base.
For this module we should see the proposed schema and how the fields will map.
The goal for it is to transform the data from XML to postgresql database to consumed by the other modules.

For OVAL data you can also have a look in the Neuvector information, and from where they are getting the oval files: https://open-docs.neuvector.com/scanning/cve_sources#cve-database-sources

We can then discuss if we should have a schedule to periodically update this data.

System CVE analyzes

Run a CVE analyze for a system (or set of systems).
It will use the information we have for the system and the OVAL data inserted in the database. This will result in which CVE's are affecting the system, and the status of each CVE.
This result should be saved in the database to be consumed later. We expect that this to be an expensive task so we need to avoid run it to often and save the results in the database.

For this you can use the diagram you presented for the verification process and CVE status in the system.

I also would like to see some more details on how we plan to run the criteria scan on a system.

API access

This would expose API methods (Json and XMLRPC) to access data and perform actions.
Some examples can be:

  • Get System CVE
  • Schedule CVE scan for system(s)
  • Get CVE's for platform (OS)
  • etc

Copy link
Contributor

@parlt91 parlt91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Houssem for your effort so far. It's really looking good already. One thing I'd like a bit more detail on is the schema part. It's really crucial that we design it with performance in mind. Would you mind explaining a bit more why you designed it the way you did so we can understand a bit better what your thought process was when creating it.

@HoussemNasri
Copy link
Contributor Author

HoussemNasri commented Jul 1, 2023

@parlt91 @rjmateus Thank you for the review. Below, is a list of tasks to be done on the RFC based on your comments, and comments from @admd that were shared in meetings. Please let me know if it's missing something.

To be done

  • Show only important methods in DAO
  • Update PatchStatus values
  • Decompose the RFC into higher-level modules
  • Propose a schedule to periodically update OVAL data
  • Add a CVE system analysis section
  • Add an API access section
  • Explain in more detail the evaluation of criteria trees
  • Explain in more detail the database schema design decisions
  • State performance metrics
  • Explain the vulnerable package extraction process and why it's not feasible to cache the result to the database
  • Add links to implementation in RFC
  • Document the unselected alternative cve auditing algorithm that depends on vulnerability and patch definitions
  • Explain the integration strategy and the rational behind it
  • Expand the section on the OVAL downloader
  • Update the OVAL DAO section
  • Update database scheme diagram (light theme would be good)
  • Update the database storage section


Once the OVAL file is parsed, the next step is to store it in a database. A relational database, such as PostgreSQL, seems like an obvious choice given that it is the default database used by Uyuni.

In Uyuni, multiple database-access patterns are being used. Skimming over the codebase, it appears that the newer parts of the code is using JPA Criteria API and Annotation-based Named queries. For that reason, the same patterns will be used to query OVAL data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About how many objects stored in DB are we talking? I assume this is a lot.
Hibernate is bad when it comes to big tables with a lot data and combining them.
We did that mistake already too often in the other tables.
So it maybe a good idea to use DTO objects and SQL instead of hibernate objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The writing of the data below took around 10 minutes with Hibernate

  • Definitions: 13660 (The writing of definitions will result in the writing of affected products, references, and mentioned cves)
  • Objects: 9657
  • States: 19036
  • Tests: 67547

Weirdly enough, using native queries took even more. I think it's because of JPA's auto-flushing strategy which prevents running native queries in bulk. Right now, I'm out of ideas to optimize it further. I'll try not to rely on Hibernate anymore and use plain SQL as you suggested. You can look at https://github.com/HoussemNasri/uyuni/blob/oval-db/java/code/src/com/suse/oval/OVALCachingFactory.java for more details about implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip: switching between hibernate and native queries is expensive. On every switch, hibernate need to verify the cache if a flush is required. Alone this check cost most of the time.
I think writing is only one aspect. It will happen during the night somewhen. If it takes some minutes more does not matter.
More imporant is reading. How long do the queries take to display data. This is where people "click" and wait for the result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading affected products from the database is one aspect of the algorithm and the fastest step I would say. There is also the evaluation of the criteria tree, the extraction of vulnerable packages from the criteria tree, and the reading of patch information from the database/channels.

Right now, I have an early draft implementation of the algorithm running and it takes between 0.2 and 14 seconds to evaluate a single CVE. It's not optimized though. With some caching, we can get those numbers way below. Unfortunately, showing all systems with all affected CVE numbers might not be feasible (it would take too much time) even with optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: It turns out that the whole process of auditing one system against one CVE takes less than a second, but it grows over time (probably Hibernate dirty checking) when checking multiple CVEs, hence why the 14 seconds.


The OVAL Downloader is responsible for finding OVAL documents online, downloading them, and caching them for easy access. However, since the relevant OVAL data is saved in PostgreSQL, it is possible for the downloader to skip caching the OVAL files or remove them once they are stored in the database. This approach would reduce the amount of storage needed.

OVAL data comes from several sources – OvalRepo, OpenSUSE OVAL repository and other Linux distributions repositories – which means that the OVAL Downloader needs to be robust in the presence of errors. It is essential also that the component is designed with comprehensive error messaging to provide instant alerts in case an OVAL source is moved or the server is down.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People wants things going automatically. We need to run the downloader recurring at least once per day and update all the information in the database.
We also need a place/way to configure which OVAL to download. Best would be to attache this to the products.
When a customer choose a product to mirror, we automatically download the OVAL.


- How to distinguish QA from released patches in non-openSUSE distributions?
- Can we depend on OVALRepo to provide us with OVAL definitions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we get the querying of data fast enough. First for the existing case, searching for affected systems of 1 CVE number.
A second case was requested since a long time already. Show all systems with all affected CVE numbers.
Currently this data set it to big to query and present to the user. Maybe this can be done with this style.

During implementation I would try as early as possible to try out the performance of the algorithms.

Copy link
Contributor

@parlt91 parlt91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side this looks fine now. I just left one remark (see below). Thank you for your effort so far :)

and `rhnPackageEVR`. One possible optimization would be to create foreign keys for `name` and `fix_version`
inside `suseOVALVulnerablePacakge` instead of hard coding their values. While the optimization results in
reduced redundancy and increased storage efficiency, it does introduce potential performance implications.
An experiment should be made to decide whether the optimization is worth it or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you do any testing around what performance impact we could expect? Personally I'd prefer not to have redundant data and link to the rhnPackage table, provided it's not to big of an impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not yet. Right now, it's not clear how I should go about testing the performance impact. I could run the CVE auditing query before and after the optimization and measure the impact in seconds, but I only have 4 clients registered. The impact might be exponentially bigger if 100s of clients were registered.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can easily add thousands of systems by copying one of yours with this script: https://github.com/uyuni-project/uyuni/blob/master/susemanager-utils/performance/copy-system.py. To test the performance impact I think you would also need to have many channels and packages in your DB. Again a small script adding the entries in the DB would be enough... but we don't have that one yet 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that seems like what I've been looking for.

Copy link
Member

@rjmateus rjmateus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just one small question that should not block the RFC to be merged.

accepted/0000-cve-auditing-with-oval.md Outdated Show resolved Hide resolved
@parlt91 parlt91 merged commit a062252 into uyuni-project:master Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants