-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debian images being given Ubuntu distribution in Scan Report #969
Comments
One additional thought that one of my team members pointed out to me this morning - I think even with updating the layer scanner the links in the |
From what I can tell, I believe the Ubuntu distribution scanner may think it's Debian but not the other way around because of the following: The Debian distribution scanner verifies the
The Ubuntu distribution scanner does no such check. I verified this happens by adding a test in
We'll see an equivalent test in I think the solution for this would to be to simply verify |
As for potential migration, I'll defer that to someone else, but I believe as long as we update the Ubuntu distribution scanner and bump the version specified in the file, then Clair users should be able to recognize the indexer has been updated, so the index report must be regenerated. Maybe just by redoing the index report there is no need for a migration? @crozzy ? |
I started a PR to attempt to resolve this: #970 |
Yeah, I think iterating the version of the |
@RTann thanks so much for getting the fix in for this so quickly. We've just been testing this in our dev environment. One thing that we have found is that because we have layers in the database that are linked to these incorrect "Ubuntu" entries via a row in the What would you advise is the right solution here? Our thoughts are that we will manually tidy up all of these wrong dist entries and Do you think that our outside-of-clair approach is the correct one or would you rather something went into Clair in addition that picked the right answer when two were found. I'd wondered about making the |
The Problem
We have some Debian test images such and they occasionally report their
IndexReport.Distribution
as Ubuntu but with some information from the Debian release such as:We run Clair in a few different environments and sometimes this image will correctly scan with a distribution did of
debian
. Furthermore, looking into thedist
table we see there are lots of Debian releases but labelled as Ubuntu.Version
9
,10
,11
and12
are all Debian releases.The Cause
Looking at the code I think this is caused by the fact that both the Debian and Ubuntu
DistributionScanner
just look for aetc/os-release
file. If that file has anVERSION_ID
andVERSION_CODENAME
field then they will match and store their dist in the DB linked to this layer. Both hard code thedid
to either ubuntu or debian when they do so. When the scanner has finished the distribution they find get written to the database by the layerscanner explaining why we have lots ofdist
table entries for Ubuntu on Debian images. (note we don't have dist entries the other way for Ubuntu images marked as Debian because the Debian scanner also tries to conver the version to an int which fails for Ubuntu images)Looking at the
etc/os-release
file for our images (we've seen this with bothdebian:stable-slim
andgcr.io/distroless/base
based images) we can see that it will meet the criteria for both scanners:As predicted when this gets to the
dist_scanartifact
table we have two links to thedist
table, one for Ubuntu and one for Debian:After the two distributions have been linked to the layer and the layer scanning phase has completed the
Controller
goes into thecoalesce
phase. Here the two distributions will be loaded using theDistributionsByLayer
query. The coalescer fordpkg
will then be asked to coalesce this data. Thedpkg
ecosystem uses the Linux Coalescer to do this. The Linux Coalescer only allows a single dist per layer:So at this point the two distributions we see in the database will be collapsed into one that we see in the ScanReport. I think that the one that is picked will be semi-random based on the order that is returned from the
DistributionsByLayer
query. One thing that confuses me though is that we don't see this problem closer to 50% of the time because that is an unordered query but we do seem to get the right distribution for both Ubuntu and Debian images the majority of the time even though they have two distributions linked in thedist_scanartifact
table, I suspect it is purely because the dpkg Ecosystem returns the Debian scanner before the Ubuntu one and that PostgreSQL isn't fully random on unordered sets. Whatever the reason for it not being 50% it clearly isn't fully reliable though as we have seen this error several times.The solution?
I noticed that the RHEL
DistributionScanner
also checked for the presence of files but in addition did a regex match on them forRed Hat Enterprise Linux (?:Server)?\s*(?:release)?\s*(\d+)(?:\.\d)?
. Could this be added to the Ubuntu and Debian scanners to get the right answer?The text was updated successfully, but these errors were encountered: