-
-
Notifications
You must be signed in to change notification settings - Fork 249
Add V2_importer to collect advisories from EUVD #2046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Sampurna Pyne <sampurnapyne1710@gmail.com>
267ed3b to
19d4d45
Compare
Signed-off-by: Sampurna Pyne <sampurnapyne1710@gmail.com>
ziadhany
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Samk1710 Great start! Just a few small tweaks
| advisory = self.parse_advisory(raw_data) | ||
| if advisory: | ||
| yield advisory | ||
| except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid using general exceptions.
| first = advisories[0] | ||
| assert isinstance(first, AdvisoryData) | ||
| assert first.advisory_id == "EUVD-2025-197757" | ||
| assert "EUVD-2025-197757" in first.aliases | ||
| assert "CVE-2025-13284" in first.aliases | ||
| assert first.summary == "ThinPLUS vulnerability that allows remote code execution" | ||
| assert first.date_published is not None | ||
| assert len(first.severities) == 1 | ||
| assert first.severities[0].system.identifier == "cvssv3.1" | ||
| assert first.severities[0].value == "9.8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be easier if you test using util_tests.check_results_against_json(result, expected_file) and with an expected file.
| if self._cached_data is not None: | ||
| logger.info(f"Using cached data: {len(self._cached_data)} items") | ||
| return self._cached_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have _cached_data? It is because the API returns repeated data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_cached_data prevents a second full API fetch.
The base importer calls fetch_data() once to count advisories and again to iterate through them.
Caching ensures both steps use the same dataset snapshot while avoiding duplicated network requests and API load.
|
|
||
| logger.info(f"Fetching data from EUVD API: {self.url}") | ||
|
|
||
| while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid loops without a condition. Maybe looping over the total 452584 advisories is a good idea.
| logger.error(f"API returned status {response.status_code} for page {page}") | ||
| retry_count += 1 | ||
| if retry_count < max_retries: | ||
| sleep_time = min(10 * (2 ** min(retry_count - 1, 5)), 60) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this sleep_time? We run the importers multiple times. If one request fails, we can have just one retry.
( please avoid complex retry )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Sampurna Pyne <sampurnapyne1710@gmail.com>
|
Hey @ziadhany , I’ve pushed the requested updates. Summary of changes:
Let me know if you’d like any additional modifications. Thanks again for the feedback and guidance! |
| """ | ||
|
|
||
| pipeline_id = "euvd_importer_v2" | ||
| spdx_license_expression = "LicenseRef-scancode-other-permissive" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| spdx_license_expression = "LicenseRef-scancode-other-permissive" | |
| spdx_license_expression = "CC-BY-4.0" |
This is CC-by-4.0 as per:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot Ayan
Signed-off-by: Sampurna Pyne <sampurnapyne1710@gmail.com>
|
I have updated the License Expression and added sample test data from the EUVD API as suggested in today's call. |
|
Hey @pombredanne @ziadhany "total": 452844 This would mean the importer would take 5-6 hours and fetch data(only once) in collect_advisories. |
|
@Samk1710 Yes, we can use the total field. Since we know the total number of advisories, we can iterate over the endpoint using either the date or the advisory count, if that’s available. the time isn’t a big issue to get all the available data in under a couple of hours. |
yes the date field is available in the API response and if time(5-6 hours) ain't a issue, this would be the simplest approach to avoid double fetching and caching will not be required. Shall I move forward with this approach? |
Signed-off-by: Sampurna Pyne <sampurnapyne1710@gmail.com>
7e4b12c to
506d03f
Compare
|
@Samk1710 here is the repo for the EUVD mirror https://github.com/aboutcode-org/aboutcode-mirror-euvd. For data collection script take a look at the pipeline here https://github.com/aboutcode-org/aboutcode-mirror-nuget-catalog/blob/main/sync_catalog.py. We want to do something similar for EUVD the script will be used in a workflow that will be almost identical to what we have here https://github.com/aboutcode-org/aboutcode-mirror-nuget-catalog/blob/main/.github/workflows/sync.yml |
thanks a lot @keshav-space. will look into it |

EUVD Importer
Overview
This pull request introduces a new importer for the EU Vulnerability Database (EUVD) provided by ENISA. The importer retrieves vulnerability advisories via the EUVD JSON API and integrates them into VulnerableCode.
Data Source
https://euvdservices.enisa.europa.eu/api/searchTest Run Results
Unit & pipeline tests
Importer Run (Full EUVD Dataset)
Importer log (451,638 advisories)
The full EUVD pipeline completed successfully in 6.4 hours locally, relevant log excerpt: