-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional catalog indexes for performance #154
Conversation
Benchmark Test ResultsBenchmark results from the latest changes vs base branch
|
527aaf3
to
043b34b
Compare
043b34b
to
afdd50a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial LGTM - I need to go back and double check the test file since it was the largest diff from this change. The tests run correctly locally and no diffs between local env and CI.
Do you want to wait for the syft PR that incorporates these changes before doing the final merge?
I can also test the syft integration if that's up for review
pkg/image/content_helpers.go
Outdated
|
||
// fetchFilesByBasenameGlob is a common helper function for resolving file references for a file basename glob pattern | ||
// catalog relative to the given tree. | ||
func fetchFilesByBasenameGlob(ft *filetree.FileTree, fileCatalog *FileCatalog, basenameGlob string) ([]file.Reference, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we're trying to move indexes away from Glob
matching in this PR - Is this the best name here?
The function is written so that **
and /
are not supported so I'm trying to think through what kinds of globs are matched then which improve the performance. Let me also check out the test for this since that might help in my review. I'l follow up with other comments as I go through this.
Already done! anchore/syft#1510 (all PR checks passing, a lot of tests added) |
I've added a lot of link resolution logic changes which warrants another review
4d50b37
to
9d407a9
Compare
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
…ution) Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
6df691d
to
033d3e4
Compare
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
5b629d8
to
561c286
Compare
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
178fc6f
to
b0fc172
Compare
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
import "sort" | ||
|
||
type IDSet map[ID]struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is feeling like we could start to use generics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already did, I ran into a 1.19 specific generics bug that causes it to fail under a specific circumstance. For that reason I reverted and went back to plain-ol'e copy paste. From the PR description:
Adds consistent set implementations for ID, path, and filenodes (which are leveraged more in this PR). This sets up for a generic implementation (which was removed from this PR for scope and go 1.19 generics bugs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is the commit where I implemented the generic approach 033d3e4 ... one of the tests fails in go 1.19 but passes in 1.20
type PathSet map[Path]struct{} | ||
|
||
func NewPathSet() PathSet { | ||
return make(PathSet) | ||
func NewPathSet(is ...Path) PathSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A 3rd set that could benefit by generics, maybe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, also covered in #154 (comment) and hinted at in the code comment below https://github.com/anchore/stereoscope/pull/154/files/386c9452f15073e4f989565ef0b66d107e28b5f9#diff-4e1b6915f312bc282522573c59b7782b79f9c16c912d9ac35c3918da07502e9aR11
} | ||
|
||
// nolint: funlen | ||
func (sc searchContext) _pathsToNode(fn *filenode.FileNode, observedPaths file.PathSet, cache map[cacheRequest]cacheResult) (file.PathSet, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is underscore naming a standard thing in go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably more of a alex-ism, more from python really
|
||
err = img.applyOverrideMetadata() | ||
if err != nil { | ||
t.Fatalf("could not create image: %+v", err) | ||
} | ||
for _, d := range deep.Equal(img, &test.image) { | ||
if d := cmp.Diff(img, &test.image, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could an assert.JSONEq
work here? The output can be easier to read than the cmp.Diff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is with struct instances and am not planning in testing the json representation of these I think something like cmp.Diff is the right tool here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass done - I've got about 23 files left I've marked to go back through to figure out how they work.
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
* add additional catalog indexes for performance Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * [wip] link resolution Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * add leaf link resolution on tree responses (defer ancestor link resolution) Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * add filetree search context Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * add tests for new search context object Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * remove unused tar header fields from file.Metadata struct Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * use singular file type definitions Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * add logging for filetree searches Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * add limited support for glob classes and alternatives Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * add failing test to show that index shortcircuits correct behavior Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * add link resolution via filetree search context Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * allow index symlink resolution to function through cycles Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * add tests for filetree.Index Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * add search by parent basename and fix requirements filtering Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * sort search results Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * change file.Type to int + fix layer 0 squashed search context Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * more cleanup Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * switch to generic set implementation Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * update linter Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * replace generic set implemetation with plain set (unstable in go1.19) Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * introduce filtree builter and foster usage of reader interfaces Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * rename content helper functions Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * update docs with background Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * fix get_xid for cross compilation Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * upgrade CI validations workflow Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * fix snapshot builds Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * add tests for file.Index.GetByFileType Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * rename file.Type and file.Resolution Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * ensure that glob results match search facade Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * replace stringset implementation + move resolution tests Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * add note about podman dependency for testing Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * address PR comments Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * remove extra whitespace Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * constrain OS build support Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * update/remove TODO comments Signed-off-by: Alex Goodman <alex.goodman@anchore.com> --------- Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
This adds additional indexes for:
These can be optionally leveraged for improved performance instead of using the standard search by glob against the filetree.
Preview of change in syft (today's behavior):
With the new indexes being leveraged:
In order for the indexes to work with consideration to link resolution, the
file.Reference
s returned from the file tree need to additionally have link resolution results for the leaf-case only. By leaf case, I mean this only surfaces links on the basenames of paths, not in parent (directory) paths. Surfacing ancestor link resolution can be done but is outside of the scope of this PR.Why is this link resolution needed? With tree-based searches the results from the search are inherently in the tree. This is not the case with indexes --the indexes are across all trees (say all layer or squash trees). This means that the results from an index lookup needs to be filtered based on nodes that are in the tree. It is important during this step that file catalog entries have the same reference ID as what is fetched from the tree, otherwise it would be possible to return a result that was fetched from the catalog but is not in the tree. However, consider a search against an index that returns a symlink which resolves to a real file that is in the tree (so is a valid result)... without considering link resolution results from these cases would be dropped.
Specific changes:
image.FileCatalog
into two objects, where most of the index-specific implementation has been migrated tofiletree.Index
filetree.Filetree
now returnsfile.Resolution
to illustrate basename link resolution with multiplefile.Resolution
objects. These objects show the path that was requested and thefile.Reference
that resolved to.Image
andLayer
methods which names and functions made more sense with pervious implementations (and no longer with this one). This includes anyFileContents*()
andFileByMIMEType*
methods.file.Metadata
and raises thefile.Metadata
object to be used more agnostically (e.g. from syft's directory resolver). Additionally this pivots the existingfile.Type
enumerations away from the standard lib tar package constants, swapping for new, tar-independent constants. A new type (int) is selected to ensure this is a breaking change from a type-system perspective and not a breaking change from a (subtle) value semantics perspecitive.filetree.Searcher
abstraction and concretefiletree.searchContext
implementation for leveraging optional indexes when searching by glob, path, or MIME type.filetree.searchContext
, rewriting globs as needed to best match the available indexes fromfiletree.Index
filetree.Reader
,filetree.Writer
andfiletree.ReadWriter
interfaces to better restrain operations on filetrees fromImage
andLayer
objects.filetree.Builder
to help construct bothfiletree.FileTree
s andfiletree.Index
es in a coordinated fashion.