-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some UI package queries return duplicate copies of the same Package URL #1278
Comments
@pombredanne @TG1999 I think I've identified at least one reason we have some duplicate Package records in the VCIO DB. While creating a series of test output modifications and test API queries today for issue 1287 (combining the And BTW, this duplication also seems to mean that when these Packages are affected by a vulnerability, the DB contents are such that there are also completely different fixed by packages for each of the duplicate affected packages. An example. 1 of 2:
2 of 2:
And speaking of id values, an initial review of several sets of queries suggests that these duplicates (there are others) were created at different times -- the id values look like they comprise separate groups. |
@pombredanne @TG1999 Last week Philippe suggested I learn how to use the Django shell and search the DB for PURLs groupedby I started with a count of packages in my local DB instance -- 596,745, a subset of the public DB. I initially ran the following to generate the result Philippe suggested.
This gave me a large output (36,428 records in the queryset) that began like this:
Despite careful study I saw no noticeable patterns. Let me know if you'd like me to upload a copy of the output here. (I also reran this with eols making it easier to read individual records.) |
It occurred to me that a more informative result might be to check only for duplicates of the
This gave me a useful output of 41,293 records that began like this:
|
A modest visual review suggests all have a count of 2, and the few that I examined were duplicates where the
This is even stranger:
I don't know how representative these examples are, but if my shell scripting is correct (and it might not be), we have nearly 42,000 pairs of records that share identical |
I'm uploading Generated with
|
@TG1999 @pombredanne In addition to the issue of duplicate PURLs, I've also noticed that at least some of these pairs of duplicates have different Affected By vulnerabilities and, when they share the same vulnerability, have different Fixed By Packages. Yes, it's true. 🙁 An example is And this is from an API query for that PURL -- 1 Affected By vuln vs. 4 Affected By vulns, and for the one vulnerability they share, 2 sets of 2 different Fixed By Packages. Seriously. 🤯 3 results total, 2 of which are the pair of duplicate PURLs. Here's the 1st of the duplicate PURLs, with 1 vulnerability and 2 fixed-by Packages (unlike the UI, the API calls them 'fixed_packages'):
|
This is the 2nd of the duplicate PURLs, with 4 vulnerabilities, not 1 like the other duplicate PURL. The pair of duplicate PURLs share 1 of these vulnerabilities -- VCID-6qxq-zyzf-aaar (id=28) (you can see it up above as well)-- and each has 2 fixed-by Packages for that vuln but they are completely different!
|
Note: when I describe the 2 sets of Fixed-By Packages as different, they have identical |
"qualifiers" is stored as a JSON field: https://github.com/nexB/vulnerablecode/blob/f3d153190fda258ba76e0a08be21b376e67f505c/vulnerabilities/models.py#L531 but it was originally stored as a encoded query string as "name=value&name=value". In hindsight, I wonder if we should not switch back to a string in the future, as I cannot think of cases where we need fine grained queries on this field. There must have been some accident in the past where we imported and saved data using a string and did convert these saved data to a JSONfield when the model evolved. |
https://public.vulnerablecode.io/vulnerabilities/VCID-6qxq-zyzf-aaar is fixed in two different ranges by two different versions which is typical of nginx. |
@johnmhoran please check the latest release of public instance and feel free to re-open the issue if needed |
Will do @TG1999 -- thank you. |
Some package queries using the UI return duplicate copies of the same Package URL. For example, a search for
pkg:deb/debian/jackson-databind@2.12.1-1?distro=sid
returnsThe text was updated successfully, but these errors were encountered: