Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAVEX: Vulnerability Lookup and base app #94

Closed
4 tasks done
pombredanne opened this issue May 8, 2024 · 9 comments
Closed
4 tasks done

CRAVEX: Vulnerability Lookup and base app #94

pombredanne opened this issue May 8, 2024 · 9 comments
Assignees
Labels
design needed Design details needed to complete the issue HighPriority High Priority integration Integration with other applications major Significant level-of-effort vulnerabilities Vulnerability Management

Comments

@pombredanne
Copy link
Member

pombredanne commented May 8, 2024

We should create a base Vulnerability application management in DejaCode with these features:

  • CRAVEX: Create a scheduler for vulnerability lookups that will lookup in VCIO
  • CRAVEX: Store vulnerability lookups in a set of database models.

Also these related VCIO issues:

@DennisClark
Copy link
Member

@pombredanne I would like to assign this one to Ziad but cannot see him on the Assignees list. Any suggestions please?

@DennisClark DennisClark self-assigned this May 14, 2024
@DennisClark
Copy link
Member

See https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/models.py and design the appropriate mapping to DejaCode.

@DennisClark
Copy link
Member

DennisClark commented Jun 19, 2024

A "scheduler" is a fairly new concept/feature for DejaCode. We need to determine if there is a usable Django library to facilitate creating such a feature. As a working start, let's consider a new section of the DejaCode admin Dashboard, right under the "Imports" section, called "Scheduler" (or similar), that has an initial option for "Refresh Vulnerabilities" (or similar) where the admin user can define the frequency and scope of the vulnerability refresh process to be run on an automatic basis.

Assumption: the basic scope of the vulnerability lookup is to find vulnerabilities associated with Packages and Components currently defined in the relevant DejaCode dataspace. This could be further refined to include only those that are assigned to a Product in that dataspace.

The scheduler should also include a task to update Components defined in the relevant dataspace with CPE values as those become available.

@DennisClark DennisClark added vulnerabilities Vulnerability Management design needed Design details needed to complete the issue integration Integration with other applications major Significant level-of-effort HighPriority High Priority labels Jun 19, 2024
@DennisClark
Copy link
Member

DennisClark commented Jun 19, 2024

The proposed vulnerability model in DejaCode should be designed to support queries such as:

  • a filter-enabled list of all the Versions of a Component Name currently defined in the relevant dataspace, showing which ones have known vulnerabilities.
  • a filter-enabled list of all the Versions of a Package currently defined in the relevant dataspace, showing which ones have known vulnerabilities.
  • a filter-enabled list of all the Versions of a Product currently defined in the relevant dataspace, showing which ones have known vulnerabilities.
  • for a Vulnerability (using the same ID as VulnerableCode), provide a filter-enabled list of all packages or components in the relevant dataspace that are associated with it
  • for a Vulnerability (using the same ID as VulnerableCode), provide a filter-enabled list of all products in the relevant dataspace that are impacted by it
  • and others to be identified of course

tdruez added a commit that referenced this issue Jul 4, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Jul 4, 2024
This change is required to call integration in the context of a scheduler where only Dataspace instances are available, not a user.

Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Jul 4, 2024
* Refactor BaseService to take a dataspace in place of user #94

This change is required to call integration in the context of a scheduler where only Dataspace instances are available, not a user.

Signed-off-by: tdruez <tdruez@nexb.com>

* Fix an issue with inject_scan_data on Component instances

Signed-off-by: tdruez <tdruez@nexb.com>

---------

Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Jul 12, 2024
tdruez added a commit that referenced this issue Jul 12, 2024
tdruez added a commit that referenced this issue Jul 15, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 12, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 12, 2024
tdruez added a commit that referenced this issue Aug 12, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 12, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 13, 2024
tdruez added a commit that referenced this issue Aug 13, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 13, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
@tdruez
Copy link
Contributor

tdruez commented Aug 20, 2024

@pombredanne @tushar Goel
Could you tell me the PURL types from the list that are not supported (no data available) by VCIO? Excluding those will reduce the number of "useless" requests to the API.
['gem', 'autotools', 'sourceforge', 'bitbucket', 'rpm', 'gitlab', 'cran', 'windows-program', 'docker', 'bower', 'nuget', 'generic', 'cargo', 'npm', 'deb', 'golang', 'maven', 'composer', 'pypi', 'hackage', 'unknown', 'rubygems', 'about', 'github']

Well, for example we have ±300,000 sourceforge PURL in the nexB Dataspace, doing lookup for those is a total waste of time and resources.

More context: For ±133,000 packages in the nexB Dataspace, it currently takes about 1h and 2,674 HTTP requests made to the VCIO API.

The result is only 1,235 vulnerabilities fetched and created.
Seems like there's a lot of wasted time and resources with our current approach.

tdruez added a commit that referenced this issue Aug 20, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 20, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 20, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
@pombredanne
Copy link
Member Author

pombredanne commented Aug 20, 2024

@tdruez re: #94 (comment)

I suggest these progressive steps:

  • use a hardcoded list of distinct existing PURL types in VCIO
  • expose this list of existing PURL types as an endpoint
  • expose a new special endpoint that would provide a highly-compressed data structure to download quickly from VCIO and that you can query to know if a PURL may exist in VCIO
    • this could be an automaton (ahocorasick or FST) leveraging the fact that many PURL share a common prefix, or a bloom filter.
    • it would be best cached for a few hours and should come withe client code to use it to filter a (long) list of PURLs to remove these that surely do not exists @ VCIO

This is tracked in this issue:

tdruez added a commit that referenced this issue Aug 20, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 20, 2024
@tdruez
Copy link
Contributor

tdruez commented Aug 20, 2024

@pombredanne Thanks, this sounds like it will require some work to make this happen.

In the short term, could VCIO expose a new "action" on the package endpoint to get this list of supported types? (Should be a very small and fast query)
On the DejaCode side, the process could start with fetching the available types to get a QuerySet limited to those and drastically reduce the number a queries.

tdruez added a commit that referenced this issue Aug 21, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 21, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 21, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 21, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 21, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 21, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 21, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 21, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Aug 21, 2024
tdruez added a commit that referenced this issue Aug 21, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
@tdruez
Copy link
Contributor

tdruez commented Aug 21, 2024

#148 merged, full implementation details available in the PR.

@DennisClark
Copy link
Member

DennisClark commented Aug 22, 2024

PR #148 provides the following CRAVEX-related functionality:

  • It introduces a new Vulnerability model and all the code logic to fetch and create Vulnerability records and assign those to Package/Component through ManyToMany relationships.
  • A new fetchvulnerabilities management command is available to fetch all the relevant data from VulnerableCode for a given Dataspace.
  • A scheduler was added to run the vulnerability data update daily (we can discuss and adjust this to the most suitable value, depending on how often VCIO is updated for example).
  • The latest vulnerability data refresh date is displayed in the Admin dashboard in a new "Data updates" section in the botton right corner.
  • The Package/Component views that display vulnerability information (icon or tab) are now using the data from the Vulnerability model in place of calling the VulnerableCode API on each request. This result into much better performances as we do not depend on the VulnerableCode performances to render the DejaCode view anymore. Also, this will make Vulnerability data available in the Reporting system.
  • A filter is available next to the "Identifier" column header in the Package list view, and Product tabs.
  • The vulnerability icon is displayed next to the Package/Component identifier in the Product views: "Inventory", "Hierarchy", "Dependencies" tabs.
  • The vulnerability data is available in Reporting either through the is_vulnerable property on Package/Component column template or going through the full affected_by_vulnerabilities m2m field. This is available in both Query and ColumnTemplate. Query example: Package > affected_by_vulnerabilities > IS_NULL = False

Scheduler:


TODO:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design needed Design details needed to complete the issue HighPriority High Priority integration Integration with other applications major Significant level-of-effort vulnerabilities Vulnerability Management
Projects
Status: Validated
Development

No branches or pull requests

3 participants