Check ignore matches before Bucket item downloads #337

hiddeco · 2021-04-12T12:04:16Z

Fixes #333

This PR makes the BucketReconciler more efficient by looking for
exclusions while downloading files, instead of during the archiving of
the downloaded contents.

It also makes the filtering applied during the archiving
configurable by introducing an optional ArchiveFileFilter
callback argument and a SourceIgnoreFilter implementation.

SourceIgnoreFilter filters out files matching
sourceignore.VCSPatterns and any of the provided patterns.
If an empty gitignore.Pattern slice is given, the matcher is set to
sourceignore.NewDefaultMatcher.

The GitRepositoryReconciler now loads the ignore patterns
before archiving the repository contents by calling
sourceignore.LoadIgnorePatterns and other helpers. The loading
behavior is breaking as .sourceignore files in the (subdirectories of the)
repository are now still taken into account if spec.ignore for a resource
is defined, overwriting is still possible by creating an overwriting rule
in the spec.ignore of the resource.

Signed-off-by: Hidde Beydals <hello@hidde.co>

This commit makes the filtering applied during the archiving configurable by introducing an optional `ArchiveFileFilter` callback argument and a `SourceIgnoreFilter` implementation. `SourceIgnoreFilter` filters out files matching sourceignore.VCSPatterns and any of the provided patterns. If an empty gitignore.Pattern slice is given, the matcher is set to sourceignore.NewDefaultMatcher. The `GitRepository` now loads the ignore patterns before archiving the repository contents by calling `sourceignore.LoadIgnorePatterns` and other helpers. The loading behavior is **breaking** as `.sourceignore` files in the (subdirectories of the) repository are now still taken into account if `spec.ignore` for a resource is defined, overwriting is still possible by creating an overwriting rule in the `spec.ignore` of the resource. This change also makes it possible for the `BucketReconciler` to not configure a callback at all and prevent looking for ignore matches twice. To finalize the bucket refactor, a change to the reconciler has been made to look for a `.sourceignore` file in the root of the bucket to provide an additional way of configuring (global) exclusions. Signed-off-by: Hidde Beydals <hello@hidde.co>

.github/actions/run-tests/Dockerfile

squaremo

Shaping up well I reckon!

squaremo · 2021-04-13T14:23:08Z

controllers/storage.go

+	return func(p string, fi os.FileInfo) bool {
+		// The directory is always false as the archiver does already skip
+		// directories.
+		return matcher.Match(strings.Split(p, string(filepath.Separator)), false)


Would https://pkg.go.dev/path/filepath@go1.16.2#SplitList work here?

I somehow went looking for this method but could not find it :-S. Thanks!

Or no, it was not. I came across this method as well but it splits them by filepath.ListSeparator, resulting in e.g. [/a/b/c, /d/f/g] instead of the [a, b, c] we are after for the domain.

Curses! I have made exactly this mistake three or four times now aaaaa

This commit updates Go to 1.16, a required change because of the use of `os.WriteFile` in one of the tests introduced by commit b5004a9. Normally _just_ this would not justify the change, but given the introduction of breaking changes (and thereby forcing a MINOR update anyway), and the various file{system, path} improvements introduced in Go 1.16 like [`filepath#WalkDir`](https://golang.org/pkg/path/filepath/#WalkDir), going ahead with this should be fine. Signed-off-by: Hidde Beydals <hello@hidde.co>

stefanprodan

LGTM

Thanks @hiddeco ☕

hiddeco added 2 commits April 13, 2021 15:34

Check ignore matches before Bucket item downloads

cca2c4a

Signed-off-by: Hidde Beydals <hello@hidde.co>

hiddeco force-pushed the efficient-bucket-download branch from 376f72a to 101814d Compare April 13, 2021 13:35

hiddeco added area/bucket Bucket related issues and pull requests area/git Git related issues and pull requests enhancement New feature or request area/storage Storage related issues and pull requests labels Apr 13, 2021

hiddeco requested review from stefanprodan and squaremo April 13, 2021 13:55

squaremo reviewed Apr 13, 2021

View reviewed changes

.github/actions/run-tests/Dockerfile Show resolved Hide resolved

hiddeco marked this pull request as ready for review April 13, 2021 14:14

squaremo reviewed Apr 13, 2021

View reviewed changes

hiddeco force-pushed the efficient-bucket-download branch from 101814d to d3bcc6a Compare April 13, 2021 14:40

stefanprodan approved these changes Apr 14, 2021

View reviewed changes

hiddeco merged commit 1494626 into main Apr 14, 2021

hiddeco deleted the efficient-bucket-download branch April 14, 2021 08:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check ignore matches before Bucket item downloads #337

Check ignore matches before Bucket item downloads #337

hiddeco commented Apr 12, 2021 •

edited

Loading

squaremo left a comment

squaremo Apr 13, 2021

hiddeco Apr 13, 2021

hiddeco Apr 13, 2021

squaremo Apr 13, 2021

stefanprodan left a comment

Check ignore matches before Bucket item downloads #337

Check ignore matches before Bucket item downloads #337

Conversation

hiddeco commented Apr 12, 2021 • edited Loading

squaremo left a comment

Choose a reason for hiding this comment

squaremo Apr 13, 2021

Choose a reason for hiding this comment

hiddeco Apr 13, 2021

Choose a reason for hiding this comment

hiddeco Apr 13, 2021

Choose a reason for hiding this comment

squaremo Apr 13, 2021

Choose a reason for hiding this comment

stefanprodan left a comment

Choose a reason for hiding this comment

hiddeco commented Apr 12, 2021 •

edited

Loading