Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Basic Nix Cataloger #1107

Merged

Conversation

juliosueiras
Copy link
Contributor

@juliosueiras juliosueiras commented Jul 19, 2022

Closes #462

Signed-off-by: Julio Tain Sueiras <juliosueiras@gmail.com>
@juliosueiras juliosueiras force-pushed the feature/nix-cataloger branch from 7bd7f31 to e7e50a7 Compare July 19, 2022 21:17
@juliosueiras juliosueiras mentioned this pull request Jul 19, 2022
@spiffcs
Copy link
Contributor

spiffcs commented Jul 21, 2022

Thanks for adding this - I can start going through today/tomorrow and smoothing this out a bit to get it to pass our CI checks and make sure we've got good coverage for the first draft here.

Signed-off-by: Julio Tain Sueiras <juliosueiras@gmail.com>
@flokli
Copy link
Contributor

flokli commented Oct 31, 2022

I gave this PR a try, simply by building what's in here. I can confirm the code successfully picks up different store paths from various layers as packages in a syft packages $containerImage call.

Comment on lines +27 to +29
FilepathPatterns: []*regexp.Regexp{
regexp.MustCompile(`/nix/store/[^-]*-([^-]*)-(?P<version>.*)/$`),
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use the regexes from https://github.com/nix-community/go-nix/blob/master/pkg/nixpath/nixpath.go#L19-L26 instead - they're a bit more strict.

@wagoodman
Copy link
Contributor

wagoodman commented Jan 12, 2023

I think the parser needs a few details still ironed out:

  • Are versions always guaranteed to be the last - delimited field? What if the version is a -? I found some other oddities as well:
bash-5.1# ls -al /nix/store | grep 'libarchive'
dr-xr-xr-x 3 root root     4096 Jan  1  1970 0v13vs0g3mfcxnw57ll8abjccgmkcydv-libarchive-3.6.1-lib

I don't think the version should be 3.6.1-lib (someone should check on this though... maybe it should be this!).

  • The raw filesystem is a good first step, but I think getting the /nix/var/nix/db/db.sqlite would also have some really rich info to put into the package metadata:
$ nix-env -i tree
$ sqlite3  /nix/var/nix/db/db.sqlite
sqlite> select * from ValidPaths where path = '/nix/store/6n0jylajz8vrm3wp5v0hrhws4rcknmpm-tree-1.8.0';
id    path                                                    hash                                                                     registrationTime  deriver                                                     narSize  ultimate  sigs                                                                                                        ca
----  ------------------------------------------------------  -----------------------------------------------------------------------  ----------------  ----------------------------------------------------------  -------  --------  ----------------------------------------------------------------------------------------------------------  --
3126  /nix/store/6n0jylajz8vrm3wp5v0hrhws4rcknmpm-tree-1.8.0  sha256:64c3847184d15149bc8a2aa5b415c3d93c8fdf34e86c16febda905982971cf23  1673544615        /nix/store/9m2l1x5fxm6pl5mhnfi0sva6b0n5w9ir-tree-1.8.0.drv  97056              cache.nixos.org-1:Fh9enR14bSRGwdokBUfx0//oWO/Vzzj3atU4/xFaLewaRK520Mf1+7Y326eNcnHy4YSM8ESkLaVtrI9wVaMDCw==    

This cataloger conceptually is a good first pass, but one question I have is should we consider cataloging the store contents based on ecosystem catalogers? I realize this wont work in all cases, but it would be interesting to see what the gap is here. I suspect that this cataloger is valuable but will have overlap with other language-based package managers (which I think is ok).

edit: nvm!

@juliosueiras would you like to pair on some of this?

@flokli
Copy link
Contributor

flokli commented Jan 12, 2023

The -lib denotes this is the lib output of the derivation. So instead of 3.6.1-lib, 3.6.1 would be more correct - but it's hard to separate this from a single output derivation, and something like 3.6.1-rc1. We could special-case common output names (bin, lib, dev).

I don't think poking around in the Nix sqlite DB is a good idea. That's an internal implementation detail of how Nix does do this currently, and there's no stability guarantees. Also, it might not be available at all, if you're simply trying to scan a container built elsewhere.

@wagoodman
Copy link
Contributor

wagoodman commented Jan 12, 2023

...We could special-case common output names (bin, lib, dev).

That's probably the right way to go. We can adjust as we find exceptions in the future too.

[the sqlite DB is] an internal implementation detail of how Nix does do this currently, and there's no stability guarantees...

Though stability would be ideal, it's not required if we're using it purely for auxiliary info. That is, we should still using the /nix/store directory as the source of truth, but if available and parsable, enriching the package with additional info from the DB would be ideal (and if we can't parse it, that's ok, the basic package info is still there).

That being said, I'm not saying that using the sqlite db is a must... this is definitely in the "it would be nice" category (something I wasn't clear about earlier). I'm also curious to learn more about what the fields in the DB are too (the hash and sigs fields).

@flokli
Copy link
Contributor

flokli commented Jan 13, 2023

...We could special-case common output names (bin, lib, dev).

That's probably the right way to go. We can adjust as we find exceptions in the future too.

Yeah, that'd be a matter of adjusting the regex

[the sqlite DB is] an internal implementation detail of how Nix does do this currently, and there's no stability guarantees...

Though stability would be ideal, it's not required if we're using it purely for auxiliary info. That is, we should still using the /nix/store directory as the source of truth, but if available and parsable, enriching the package with additional info from the DB would be ideal (and if we can't parse it, that's ok, the basic package info is still there).

That being said, I'm not saying that using the sqlite db is a must... this is definitely in the "it would be nice" category (something I wasn't clear about earlier). I'm also curious to learn more about what the fields in the DB are too (the hash and sigs fields).

Again, I think this would clearly be a very different type of cataloger. This requires the build system the container was built with to be around at the time of invoking syft (and not just the container image). I'd really see it out of scope for here.

A better path forward for this would be something like Nix dumping some more version information into container metadata, or a well-known location inside the container.

@juliosueiras do you have time to address the splitting of output names / regex changes in general and can then mark this as ready for review?

@wagoodman
Copy link
Contributor

This requires the build system the container was built with to be around at the time of invoking syft...

That distinction makes sense. Ultimately, I'm alright with not including reading from this source in this PR, so I retract my earlier request 🙏 .

[side discussion, no longer about this PR] I want to highlight that build-time info is a popular use case for SBOMs as well, so it still sounds like this data source could be an interesting add on later, but I'm curious on your thoughts about it needing to be a different cataloger (I was thinking it would be an extension of this one).

@flokli
Copy link
Contributor

flokli commented Jan 13, 2023

[side discussion, no longer about this PR] I want to highlight that build-time info is a popular use case for SBOMs as well, so it still sounds like this data source could be an interesting add on later, but I'm curious on your thoughts about it needing to be a different cataloger (I was thinking it would be an extension of this one).

I think Nix / the container building infrastructure should have a way to emit some standardized BOM format for a given Nix closure), and in the case of container images, attach it alongside, for syft to be able to pick it up. Not sure if there's a suitable metadata field for it (yet), but this is probably too much out of scope for here.

It's also a problem independent of Nix - these BOMs could be created and attached for anything else assembling container images. Let's open an issue and discuss there?

@wagoodman
Copy link
Contributor

wagoodman commented Jan 19, 2023

@juliosueiras I've tried pushing to your PR without luck, I get rejected on the CLI. I think this is some behavior between gh pr checkout... and the branch name having a / (total guess, but has happened before).

I've pushed changes to a local branch to this repo here: https://github.com/anchore/syft/compare/feature/nix-cataloger

These changes include:

  • rebasing onto main
  • refactoring to use resolver.AllLocations
  • moving the set of owned locations to the pkg.NixStoreMetadata and leaving the pkg.Locations to still indicate the locations where the existence of the package is evident (just the nix store path directory, not the contents)
  • refactored the nix store path parsing and added several tests

what's still left todo:

  • fixing/enhancing integration tests
  • testing that post-cataloging package-to-package relationships can be formed when there are overlapping package manager concerns (e.g. a python wheel being installed and managed via nix)

@juliosueiras / @flokli what do you think of these changes?

@juliosueiras can you pull in these changes to your branch (if you approve of the changes)?

@wagoodman wagoodman self-assigned this Jan 19, 2023
@flokli
Copy link
Contributor

flokli commented Jan 23, 2023

@wagoodman I left some comments on your commits, but I feel like this here would be a more appropriate place to discuss.

From your checkout, can you try pushing with git push git@github.com:juliosueiras-nix/syft.git HEAD:refs/heads/feature/nix-cataloger (or add the juliosueiras as a remote and then push to the remote name)?

@wagoodman
Copy link
Contributor

From your checkout, can you try pushing with git push git@github.com:juliosueiras-nix/syft.git HEAD:refs/heads/feature/nix-cataloger (or add the juliosueiras as a remote and then push to the remote name)?

I tried both of these things without luck (permission denied).

@wagoodman
Copy link
Contributor

I think the next best option is to change the branch target from main to a feature branch in this repo. I can merge this as-is to the feature branch, then open a PR for that branch with the additions I have. (mental note for me: if I go that direction I need to remember to update the changelog to attribute @juliosueiras directly. )

@juliosueiras shout out if you're still interested in working on this directly and I'll hold off on that plan.

@wagoodman wagoodman changed the base branch from main to add-nix-cataloger March 24, 2023 19:23
@wagoodman wagoodman added the changelog-ignore Don't include this issue in the release changelog label Mar 24, 2023
@wagoodman wagoodman marked this pull request as ready for review March 24, 2023 19:24
@wagoodman wagoodman merged commit 818cf3f into anchore:add-nix-cataloger Mar 24, 2023
@wagoodman
Copy link
Contributor

FYI, I've merged this into a new in-repo branch add-nix-cataloger. I'll try to get this over the finish line. When this lands in the release notes I'll update it manually to make certain you get attribution @juliosueiras -- thanks for the initial PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog-ignore Don't include this issue in the release changelog
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Add Nix Cataloger
4 participants