Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Bucket Controller and Add Bucket Provider Interface #455

Closed
wants to merge 3 commits into from

Conversation

pa250194
Copy link
Contributor

If applied, this PR will refactor the Bucket Controller and add a Bucket Provider Interface that can be implemented by any bucket provider added for the bucket controller. This refactor is as a result of comment in PR #434 (comment)

@pa250194 pa250194 force-pushed the bucket-provider-interface branch 4 times, most recently from 55d3a48 to 58188dc Compare October 21, 2021 14:42
@pa250194 pa250194 marked this pull request as ready for review October 21, 2021 14:43
@stefanprodan stefanprodan added area/bucket Bucket related issues and pull requests enhancement New feature or request labels Oct 21, 2021
@stefanprodan stefanprodan requested a review from squaremo October 21, 2021 14:49
@pa250194 pa250194 force-pushed the bucket-provider-interface branch from 5fde31b to 4105490 Compare October 21, 2021 14:55
Copy link
Member

@squaremo squaremo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wooo, thanks for the follow-up! Using an interface to abstract from the two client is good, and I'm pleased to see thorough tests.

I think you can go further and have all the download-if-matches code in one place, relying on the clients only to do the listing and downloading. More details on that in the comments.

There's one thing that will need fixing, either way: if a file download fails, then the whole operation should fail. I've outlined how to achieve that, in the comments.

controllers/bucket_controller.go Show resolved Hide resolved
controllers/bucket_controller.go Outdated Show resolved Hide resolved
controllers/bucket_controller.go Outdated Show resolved Hide resolved
controllers/bucket_controller.go Outdated Show resolved Hide resolved
controllers/bucket_controller.go Outdated Show resolved Hide resolved
pkg/gcp/gcp.go Outdated Show resolved Hide resolved
pkg/gcp/gcp.go Outdated Show resolved Hide resolved
pkg/minio/minio.go Outdated Show resolved Hide resolved
pkg/minio/minio.go Outdated Show resolved Hide resolved
pkg/gcp/gcp.go Outdated Show resolved Hide resolved
@squaremo
Copy link
Member

I think you can go further and have all the download-if-matches code in one place, relying on the clients only to do the listing and downloading. More details on that in the comments.

I'm suggesting a fairly big change here, I realise. If you're amenable to the suggestion, perhaps we could collaborate on it @pa250194? Either by me being much more specific in my description, or even making a pull request to your fork.

@pa250194
Copy link
Contributor Author

pa250194 commented Nov 1, 2021

Hey @squaremo I'm almost done making the changes. We can for sure collaborate on this pull request. I will push my most recent changes and fixes to my branch in a few hours. You can for sure make a pull request to my fork.

@pa250194 pa250194 force-pushed the bucket-provider-interface branch 3 times, most recently from 66682e5 to db1e8db Compare November 3, 2021 18:46
@pa250194
Copy link
Contributor Author

pa250194 commented Nov 3, 2021

Hey @squaremo I have just finished the requested changes. Please let me know if there is anything else I need to change and I will be glad to refactor it 🙂.

@squaremo
Copy link
Member

@pa250194 I've made a PR to your PR, which does a bit more shifting code around. See what you think!
In the meantime, there's some tests which fail both in CI and on my laptop, which are worth looking into. (I'm away for the next couple of weeks, just to let you know.)

Comment on lines 86 to 91
BucketExists(context.Context, string) (bool, error)
ObjectExists(context.Context, string, string) (bool, error)
FGetObject(context.Context, string, string, string) error
ListObjects(context.Context, gitignore.Matcher, string, string) error
ObjectIsNotFound(error) bool
VisitObjects(context.Context, string, func(string) error) error
Close(context.Context)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it would be better if this moved to pkg/bucket (or internal/bucket) and have the implementation specific packages move to pkg/bucket/<impl>. This creates better separation between the reconciler and the bucket clients, and aligns with the structural changes recently made in #462 and pending in #485.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it would be better if [the interface] moved to pkg/bucket (or internal/bucket) [...]
This creates better separation between the reconciler and the bucket clients

I don't see how it does that. Then the controller code would be coupled to the package with the interface and the packages with the implementations, since it's the controller that decides which implementation it wants (because it has access to the bucket object). In general it's better to declare requirements in the place they are required.

If all the funcs using a bucket client could be put in a package (e.g., internal/bucket) without creating a cycle, then it might be reasonable to move the interface (or better: more precise interfaces) there, since that's where the requirement would be. But it would be a bit of an empty gesture, when there are no other consumers of those funcs.

@pa250194
Copy link
Contributor Author

Hey @squaremo sorry for the late response. I am currently occupied by a high priority sprint be will look at the PR and the tests that fail soon. I appreciate the help and contribution to my PR 🙂.

@pa250194 pa250194 force-pushed the bucket-provider-interface branch 2 times, most recently from 22a007d to 83431e2 Compare December 16, 2021 04:29
@squaremo
Copy link
Member

squaremo commented Jan 4, 2022

Hi! I'm back from holidays. With respect to the client tests: I think it would be reasonable to verify only that the wrappers respond to their altered API correctly assuming that the underlying client libraries work. That way you don't have to use the real services or mock them -- you can just provide a fake client. I'll work on that for the GCP client.

@squaremo
Copy link
Member

squaremo commented Jan 5, 2022

Although ... the fake service that's there works OK, and it looks like the problem is mainly that the test that fails (https://github.com/fluxcd/source-controller/runs/4543207347?check_suite_focus=true#step:9:952) just needs to be running in GCP or have an environment variable set. The test itself is just checking that the branch at

is taken when there's no secret.

The expedient thing to do may be to come up with a different way to test if the client creation logic works 🤔 ...

@squaremo squaremo force-pushed the bucket-provider-interface branch from 83431e2 to 342e331 Compare January 5, 2022 14:34
@squaremo
Copy link
Member

squaremo commented Jan 5, 2022

(I rebased to remove the merge commits)

@squaremo squaremo force-pushed the bucket-provider-interface branch from 342e331 to 16b3cbb Compare January 6, 2022 16:07
@pa250194
Copy link
Contributor Author

Hey @squaremo just got back from vacation. I will take a look at the test and push a fix for it as I see this is blocking another PR.

@squaremo
Copy link
Member

I will take a look at the test and push a fix for it as I see this is blocking another PR.

To expand on my comment earlier: I had a look at faking the GCP storage client and it is .. complicated. Mainly because there's a couple of layers of methods that return structs with their own methods, all of which need to be represented as interfaces. So it would be possible, but verbose.

If there's a way to test that the construction logic does what it's supposed to (ideally without requiring an external service and credentials), the rest is covered with the fake service. I assume that faking the authentication service is a lot of trouble. One way might be to supply a factory interface to the construction logic, and verify that it's called in the correct way. I'm sure there are other ways.

@pa250194
Copy link
Contributor Author

I will take a look at the test and push a fix for it as I see this is blocking another PR.

To expand on my comment earlier: I had a look at faking the GCP storage client and it is .. complicated. Mainly because there's a couple of layers of methods that return structs with their own methods, all of which need to be represented as interfaces. So it would be possible, but verbose.

If there's a way to test that the construction logic does what it's supposed to (ideally without requiring an external service and credentials), the rest is covered with the fake service. I assume that faking the authentication service is a lot of trouble. One way might be to supply a factory interface to the construction logic, and verify that it's called in the correct way. I'm sure there are other ways.

I agree faking the GCP client is complicated. I was thinking that because all line 65 - 71 does is use the GCP client.


We could probably implement that test in a new PR so we are not blocking the other PR.

In the gcp_test.go line 173 is the test that is failing because it is trying to use the gcp client to authenticate using the GOOGLE_APPLICATION_CREDENTIALS env variable. So I will remove that test and we could implement the test with a fix in another PR.

func TestNewClientWithoutSecretErr(t *testing.T) {

if that sounds good with you, I can make a push with the changes.

@squaremo
Copy link
Member

We could probably implement that test in a new PR so we are not blocking the other PR.

Yes, I think it's reasonable to defer tests for the construction logic for the minute; the code under test is pretty self-evident (if there's a secret, use that, otherwise let the client do whatever it does ...).

@pa250194
Copy link
Contributor Author

Hi @squaremo can I get some guidance on how to get the DCO check to pass. I believe other than that this PR should be ready to go. Thank you for your help 🙂

@squaremo
Copy link
Member

squaremo commented Jan 24, 2022

Hi @squaremo can I get some guidance on how to get the DCO check to pass

You'll need to rewrite the commits that don't have a sign-off (they are listed on the details page for the DCO check). A fairly easy way to do this is with git rebase --interactive, since that will give you the chance to go back through all the commit in one go.

git rebase -i main

will bring up an editor with a list of commits since branching from main. Find those you need to add your sign-off to, and put an edit in the first column. That means git will stop at that commit and let you amend it.

To add your sign-off to a commit, you can do

git commit --amend -s

Then you can do

git rebase --continue

to keep going through the commits in the list. If you lose track of where you are, git status will tell you where you are up to; and, you can abandon the rebase (e.g., to start again) with git rebase --abort, which will take you back to the state before you started.

Once you've finished, check git history to see if it looks right, and git push --force-with-lease to push your rebased branch up to github. It can all be reverted, so do not worry about losing work.

@squaremo
Copy link
Member

squaremo commented Jan 24, 2022

You can fix the merge conflict at the same time, if you pull from the main branch in this repo first then use that as the upstream.

@pa250194
Copy link
Contributor Author

I think I may have messed up the rebase 🤦🏽‍♂️ @squaremo

@squaremo squaremo force-pushed the bucket-provider-interface branch from 7d94fa7 to 5ef220d Compare January 31, 2022 12:19
pa250194 and others added 3 commits January 31, 2022 12:57
 - Added Bucket Provider Interface

Signed-off-by: pa250194 <pa250194@ncr.com>
The algorithm for conditionally downloading object files is the same,
whether you are using GCP storage or an S3/Minio-compatible
bucket. The only thing that differs is how the respective clients
handle enumerating through the objects in the bucket; by implementing
just that in each provider, I can have the select-and-fetch code in
once place.

This deliberately omits the parallelised fetching that the GCP client
had, for the sake of lining the clients up. It can be reintroduced (in
the factored out code) later.

Signed-off-by: Michael Bridgen <michael@weave.works>
This commit reintroduces the use of goroutines for fetching objects,
but in the caller of the client interface rather than in a particular
client implementation.

Signed-off-by: Michael Bridgen <michael@weave.works>
@squaremo squaremo force-pushed the bucket-provider-interface branch from 5ef220d to 53c2a15 Compare January 31, 2022 16:23
// Look for file with ignore rules first.
ignorefile := filepath.Join(tempDir, sourceignore.IgnoreFile)
if err := client.FGetObject(ctxTimeout, bucket.Spec.BucketName, sourceignore.IgnoreFile, ignorefile); err != nil {
if client.ObjectIsNotFound(err) && sourceignore.IgnoreFile != ".sourceignore" { // FIXME?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what is the intent of this condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets the file from the bucket and checks if the error returned is Object not found and ensures the error is not thrown for the .sourceignore file because some users may decide not to have sourceignore files in their buckets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this part of the condition: sourceignore.IgnoreFile != ".sourceignore" -- this tests that the const sourceignore.IgnoreFile is not_equal to a literal value. The literal value happens to be the value of the const, at present, so this part -- and the whole condition -- will never be true (the bucket will never be marked unready if the ignore file is missing). If the const value changes in a later revision of code, then it will be equivalent to the first part of the condition (the bucket will always be marked unready if the ignore file is missing).

From your description, I think the intent is to only mark the bucket as unready if there was an error other than the file not being found.

@squaremo
Copy link
Member

squaremo commented Jan 31, 2022

I think I may have messed up the rebase 🤦🏽‍♂️ @squaremo

Don't worry -- rebasing takes a lot of practice and it's easy to lose your place, or misstep. It often takes me a few attempts to get a rebase right.

I've put things back in order here. This is what I did:

  • I squashed your original commits (refactor to introduce the interface, use error groups, 2 x fix tests) into one commit, without changing the branching point. They are all one change anyway, just with a few fixups. Squashing them first made it easier to rebase onto another branch, because it avoided the situation in which I have to fix a conflict in one commit, only to revisit it when the same lines are changed later. (It also removed some commit objects that needed sign-off, fixing the DCO check)
  • I rebased the newly rewritten branch on origin/main, to fix the "conflicts with main branch detected" message that was above.
  • I did git push --force-with-lease to your PR branch.

@pa250194
Copy link
Contributor Author

I think I may have messed up the rebase 🤦🏽‍♂️ @squaremo

Don't worry -- rebasing takes a lot of practice and it's easy to lose your place, or misstep. It often takes me a few attempts to get a rebase right.

I've put things back in order here. This is what I did:

  • I squashed your original commits (refactor to introduce the interface, use error groups, 2 x fix tests) into one commit, without changing the branching point. They are all one change anyway, just with a few fixups. Squashing them first made it easier to rebase onto another branch, because it avoided the situation in which I have to fix a conflict in one commit, only to revisit it when the same lines are changed later. (It also removed some commit objects that needed sign-off, fixing the DCO check)
  • I rebased the newly rewritten branch on origin/main, to fix the "conflicts with main branch detected" message that was above.
  • I did git push --force-with-lease to your PR branch.

Thank you so much for taking out time to explain the steps to me 🙂

@hiddeco
Copy link
Member

hiddeco commented Feb 23, 2022

Sorry about the wait, I'll be picking this up now to fit it into the recent (rigorous) changes in main.

@hiddeco hiddeco self-assigned this Feb 23, 2022
@hiddeco
Copy link
Member

hiddeco commented Feb 28, 2022

As due to the various conflicts I had to fiddle around with the order and state of commits, which I did not want to get lost (historically) here, this is superseded by #596.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bucket Bucket related issues and pull requests enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants