Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read relative etc/apk/repositories for alpine version when no OS provided #1615

Merged
merged 3 commits into from
Mar 2, 2023

Conversation

deitch
Copy link
Contributor

@deitch deitch commented Feb 23, 2023

fixes #1572

Sort of

The way apkdb works now, if it cannot find an /etc/os-release (or similar files on a fixed list), and if the ID is not "alpine", it still reports on the package, but no purl is provided.

This has several problems.

  • not everything will have an etc/os-release. This especially runs true when you are looking at a filesystem with multiple containers, or perhaps you have modified the os-release because your OS isn't alpine, but the packages in the embedded container are.
  • there are multiple potential sources for packages; os-release is not necessarily the best place to find out where it came from.
  • packages can come from multiple parallel locations, i.e. /etc/apk/repositories can have alpinelinux.org but also other places
  • the version used for the packages might entirely be different from the version of the OS

This tries to address some of it. It does so very cautiously. If it cannot find an /etc/os-release, i.e. linux.Release is nil, then then it tries to read the etc/apk/repositories relative to where it found lib/apk/db/installed (so if it is embedded deep in the filesystem, it still can find it). If it finds alpinelinux.org in there, it sets it up as if running on alpine, and takes the version from there.

It is an improvement over the current design, but could be improved further.

  • it should prefer etc/apk/repositories over os-release; this uses it only if release still is nil
  • it should read multiple options from repositories, but that will have to wait, since there is no way to correlate those with a particular package in installed

This is a real if imperfect improvement over the current state.

apk team is looking at improving this chain-of-custody, including providing a purl in the installed file. Things will get much easier then. See this issue. Until then, this can help.

@deitch deitch force-pushed the apk-purl branch 3 times, most recently from 53259c1 to bab9c68 Compare February 24, 2023 11:53
Copy link
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'm good with this approach; I left some implementation detail notes (consolidating into the r == nil section, simpler regex handling). Also, could you add some sort of test for this?

syft/pkg/cataloger/apkdb/parse_apk_db.go Outdated Show resolved Hide resolved
Comment on lines 25 to 32
var (
repoRegex = regexp.MustCompile(`^https://.*\.alpinelinux\.org/alpine/v([^\/]+)/([a-zA-Z0-9_]+)$`)
newlineSplitRegex = regexp.MustCompile(`[\r \s\n]+`)
)

func init() {
repoRegex.Longest()
newlineSplitRegex.Longest()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be simplified to use a multiline modifier ((?m)) in the original regex, e.g:

repoRegex = regexp.MustCompile(`(?m)^https://.*\.alpinelinux\.org/alpine/v([^\/]+)/([a-zA-Z0-9_]+)$`)

... and since it reads the entire line (the ^ and $), it probably doesn't need the .Longest() modifier.

see: https://regex101.com/r/goXXNl/1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good approach. Pity that I deleted the play.golang where I was testing all of it. Oh well.

func parseApkDB(resolver source.FileResolver, env *generic.Environment, reader source.LocationReadCloser) ([]pkg.Package, []artifact.Relationship, error) {
// find the repositories file from the the relative directory of the DB file
var repos []string
if resolver != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole block can be moved down into the r == nil check to only occur if there isn't a linux release identified. As noted in your assessment of this issue, it might be valid to read the repositories in order to generate a correct PURL, but since there is no way to correlate these entries to the apkdb currently, and we are just looking for alpinelinux.org, we could avoid doing this if we already identified a release for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still holding back on the thought that this is valuable on its own, and at some point, we will prefer this over the OS. If it doesn't hurt, let's leave it up here?

Copy link
Contributor

@kzantow kzantow Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the point that this could be more useful to identify the source than the linux release itself, and agree. My hesitation here is that currently it is only validating this is a known alpine release, which is why for now I would probably move it to the r == nil check below.

The oddity here is that if there is a linux release identified, this isn't used even if it has differing information so there would need to be a different mechanism to set this.

Ok, I dug in a bit more to the code, I think a cleaner solution might be good to do a little bit of refactoring here, starting with packageURL here: https://github.com/anchore/syft/blob/main/syft/pkg/cataloger/apkdb/package.go#L30 (it would also need to replace the hardcoded "alpine" with the namespace arg and remove the check that it's "alpine"

And newPackage here: https://github.com/anchore/syft/blob/main/syft/pkg/cataloger/apkdb/package.go#L12

These currently take the linux.Release, which is why you need to create one, but they only really need the string namespace.

So if we change var repos []linux.Release to var namespace string, then we can default to the linux release and subsequently look up the repositories file, e.g.:

	var namespace string
	if env != nil && env.LinuxRelease != nil {
		namespace = env.LinuxRelease.ID
	}
	// do the /etc/apk/repositories lookup for a namespace
	// and other stuff
	// then later on: newPackage(apk, namespace, reader.Location)

This would pave the way to detect different sources a bit better, I think.

What do you think about this?

Comment on lines 50 to 56
reposDirect := newlineSplitRegex.Split(string(reposB), -1)
for _, repo := range reposDirect {
if repo != "" {
repos = append(repos, repo)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If simplified to the single regex, this would just match and extract the version, like what is happening in the section below

@deitch
Copy link
Contributor Author

deitch commented Feb 24, 2023

Addressed all of your points, take a look please.

Copy link
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the lengthy comment, but I think I've outlined how we can make this change a bit cleaner without the need to make "fake" linux.Release objects, WDYT?

syft/pkg/cataloger/apkdb/parse_apk_db.go Outdated Show resolved Hide resolved
syft/pkg/cataloger/apkdb/parse_apk_db.go Outdated Show resolved Hide resolved
syft/pkg/cataloger/apkdb/cataloger_test.go Show resolved Hide resolved
syft/pkg/cataloger/apkdb/parse_apk_db.go Outdated Show resolved Hide resolved
func parseApkDB(resolver source.FileResolver, env *generic.Environment, reader source.LocationReadCloser) ([]pkg.Package, []artifact.Relationship, error) {
// find the repositories file from the the relative directory of the DB file
var repos []string
if resolver != nil {
Copy link
Contributor

@kzantow kzantow Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the point that this could be more useful to identify the source than the linux release itself, and agree. My hesitation here is that currently it is only validating this is a known alpine release, which is why for now I would probably move it to the r == nil check below.

The oddity here is that if there is a linux release identified, this isn't used even if it has differing information so there would need to be a different mechanism to set this.

Ok, I dug in a bit more to the code, I think a cleaner solution might be good to do a little bit of refactoring here, starting with packageURL here: https://github.com/anchore/syft/blob/main/syft/pkg/cataloger/apkdb/package.go#L30 (it would also need to replace the hardcoded "alpine" with the namespace arg and remove the check that it's "alpine"

And newPackage here: https://github.com/anchore/syft/blob/main/syft/pkg/cataloger/apkdb/package.go#L12

These currently take the linux.Release, which is why you need to create one, but they only really need the string namespace.

So if we change var repos []linux.Release to var namespace string, then we can default to the linux release and subsequently look up the repositories file, e.g.:

	var namespace string
	if env != nil && env.LinuxRelease != nil {
		namespace = env.LinuxRelease.ID
	}
	// do the /etc/apk/repositories lookup for a namespace
	// and other stuff
	// then later on: newPackage(apk, namespace, reader.Location)

This would pave the way to detect different sources a bit better, I think.

What do you think about this?

@deitch
Copy link
Contributor Author

deitch commented Feb 25, 2023

Oh, yeah, I like the namespace approach. Going to give it a shot quickly now.

@deitch
Copy link
Contributor Author

deitch commented Feb 25, 2023

OK, I tried. The problem is that apkdb.parseApkDB() calls apkdb.newPackage(), which calls apkdb.packageURL(), which calls syft/pkg.PURLQualifiers(), which is more generically shared, and expects to receive a *linux.Release. It then uses that to receive the ID and check for a version as well. So replacing *linux.Release with namespace not only doesn't work, but it breaks the ability to pass more information out.

@deitch
Copy link
Contributor Author

deitch commented Feb 25, 2023

For now, I left is as var releases []linux.Release, but open to what you have in mind.

If you ask me, the correct thing to do here is to have something entirely different passed to PURLQualifiers():

type PackageRelease interface {
  func ID() string
  func VersionID() string
  func BuildID() string
}

func PURLQualifiers(vars map[string]string, release PackageRelease) (q packageurl.Qualifiers) {

Some packages are tied to a linux release, others not so much.

That requires changing everything that calls it, so it might be a bit much for this. I would sooner have this in, and then we can worry about the right abstraction for PURLQualifiers()

…ided

Signed-off-by: Avi Deitcher <avi@deitcher.net>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
@wagoodman
Copy link
Contributor

wagoodman commented Mar 1, 2023

@deitch solid improvement 💯 .

I pushed a small refactor that decomposed the additions into smaller functions and swapped out multiple nesting of conditions with guard clauses instead, but it should logically the same (with a couple changes). You can treat this commit as a suggestion, if you really don't like it or would prefer a different approach feel free to force-push over it / revert it.

The upside of the decomposition is to allow for easier unit testing of just the apk repository file parsing (now the parseReleasesFromAPKRepository function). Can you add a unit test that exercises parsing the repository file?

@deitch
Copy link
Contributor Author

deitch commented Mar 1, 2023

Yeah, that's a fine breakdown; smaller is easier whenever possible.

Signed-off-by: Avi Deitcher <avi@deitcher.net>
@deitch
Copy link
Contributor Author

deitch commented Mar 1, 2023

Added a test of that. Back at you.

@kzantow kzantow dismissed their stale review March 2, 2023 18:04

Changes are good now

@kzantow kzantow merged commit 01230aa into anchore:main Mar 2, 2023
@kzantow
Copy link
Contributor

kzantow commented Mar 2, 2023

Thanks for this contribution @deitch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

purl for apk packages missing when installed db file is not in root
3 participants