Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

skip aws api check when regions are passed from config #3124

Closed
wants to merge 1 commit into from
Closed

skip aws api check when regions are passed from config #3124

wants to merge 1 commit into from

Conversation

evq
Copy link
Contributor

@evq evq commented Jun 9, 2020

fixes #3015

@squaremo
Copy link
Member

The preflight check is used to initialise the AWS state when that's needed, but it's also used if --registry-require=ecr is given (it's like an assertion, meaning "crash if you can't get creds via AWS"). This change would break the latter use, since it will succeed without checking, if you've supplied regions.

I'm not sure why using ECR from Fargate (as in #3015) is different to using ECR from anywhere else. If you can give an explanation why this is the right change to fix 3015, I'd appreciate it.

The way forward then would be to separate the preflight check from the region calculation, so the check can still be called as an assertion.

@evq
Copy link
Contributor Author

evq commented Jun 16, 2020

I don't think this is a complete change to fix ECR on Fargate, I figured I would push it just in case anyone else ran into this issue and needed a quick workaround ( assuming they cared more about having it work than having it not work since this breaks the assertion ).

I'm not actually running this on Fargate at all, but my setup reproduces the same error mentioned in #3015. I'm using IAM roles for service accounts to provide IAM permissions to this pod. The pod is running on an EC2 nodegroup where I have blocked access to the EC2 metadata service in able to prevent the pods from accessing the instance credentials, per best practice.

So the issue arises when flux is unable to access the EC2 metadata service. This can be verified by looking at the error pbn4 reported. Currently the preflight check doesn't actually have any bearing on whether you'll be able to actually connect to ECR, only whether you can connect to the EC2 metadata service - a totally separate and unrelated service.

I believe there are two changes that should be made to properly fix this:

  1. The region lookup via metadata service should be separated from the preflight check and made optional.
  2. The preflight check ideally would attempt to retrieve ECR credentials, such a check would be more accurate and could even be enforced on Fargate / locked down EC2

Unfortunately I don't think 2 always works if I understand some of the use cases right. Originally I was wondering if we could simply call fetchAWSCreds(region, []string{accountID}) for each region / account id permutation in the preflight (using config.AccountIDs). However, AccountIDs is optional - as it seems they can be automatically picked up from the ECR repository URL of an image at runtime? Also there's no good way to know which of the combinations of {region, accountID} are supposed to work, so we would simply have to check whether ANY of them work. There doesn't really seem to be enough information at startup to do a proper preflight check, so let know how you would like to proceed.

@squaremo
Copy link
Member

That's a good diagnosis, I understand it much better now.

So paraphrasing a little, there's two problems to be solved, potentially:

.. and the latter is tricky because it may not be possible to check ECR access before knowing an account or region to check with.

I would be happy for the first problem to be solved, without the second; people that are using --registry-require=ecr will have the same behaviour as they get now, and people that need #3015 fixed will get a working system.

@yannick
Copy link

yannick commented Jun 30, 2020

this is also a problem when running flux locally, e.g. in minikube and the containers are in ECR.

i dont really see a reason why you cant have a successful prefilight check by just setting AWS_DEFAULT_REGION

@squaremo can you elaborate the 2nd point? if the ec2 api fails you can simply check if registry-ecr-include-id and registry-ecr-region gives back accessible repos, else fail. why would that lead to different behaviour for people using --registry-require=ecr ?

@kingdonb kingdonb self-assigned this Mar 11, 2021
@kingdonb kingdonb added the bug label Mar 11, 2021
@kingdonb
Copy link
Member

Greetings,

If you can pull --rebase the master branch and squash your commits to amend with --sign-off per the DCO bot requirement, I will review them and include them for merge whenever possible! Please let me know if you are still interested in merging this.

I cannot merge or adopt your commits without the DCO, please rebase and squash, and sign off

Thank you for your contribution!

Copy link
Member

@kingdonb kingdonb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to make any changes in .dockerignore, .github/workflows, or Dockerfile?

When you rebase, could you remove these changes first?

@kingdonb kingdonb added this to the 1.22.0 milestone Mar 11, 2021
@evq
Copy link
Contributor Author

evq commented Mar 11, 2021

Happy to do so tomorrow, that most recent commit was just for an internal build - will prune it so we can get this merged

@kingdonb
Copy link
Member

I'm not especially comfortable reviewing this change by myself since I don't have an environment that I can test it in, to prove to my satisfaction that it has the desired effects.

But it seems like the discussion has already been talked out and with a maintainer satisfied that the change is OK, I would be happy to merge and include it in the release. The DCO is check is unfortunately still blocked.

But... I can apparently also bypass it now. Either someone with higher permissions than +v is listening and has found a way to grant me the ability to mark the DCO check as passing, or... 👻

If you're having trouble rebasing, please let me know, I can help; there is still time to merge in for the next release!

Thanks again.

@kingdonb
Copy link
Member

It's no problem, we can have a 1.22.1 release out soonish if this is still needed.

Whenever you get around to it is fine.

@kingdonb kingdonb modified the milestones: 1.22.0, 1.22.1 Mar 17, 2021
Signed-off-by: eV <ev@7pr.xyz>
@evq
Copy link
Contributor Author

evq commented Mar 18, 2021

sorry for the delay - i've removed the last commit, rebased on origin/master and amended with --signoff. it appears the DCO is passing now - although there is a FOSSA failure related to some rate limit

@kingdonb kingdonb modified the milestones: 1.22.1, 1.22.2 Mar 31, 2021
@kingdonb
Copy link
Member

kingdonb commented Apr 15, 2021

I was going to rebase this and retry it against the latest commit in master but I noticed the PR does not have "maintainers can edit this PR" set. Is it possible to edit the PR and check that box, so I don't have to come back and ask you to rebase again?

Alternatively, if you can git pull --rebase origin master and git push --force, that should induce the PR checks to run again. I think the failure was most likely a transient one that will not come back the next time.

I can't merge or take any further action until one of these actions is taken. Thanks again for your contribution.

@kingdonb kingdonb removed this from the 1.22.2 milestone Apr 16, 2021
@kingdonb
Copy link
Member

kingdonb commented May 1, 2021

We're working our way down to zero backlogged PRs. This is one that I think we wanted to merge.

If you can set the "maintainers with write access can edit this branch" flag on the PR, I can try to move it forward.

Thanks for using Flux!

@kingdonb
Copy link
Member

I think the DCO sign-off is what is supposed to empower me to reopen this as a new PR in my name without asking for permission, so I can run the tests again and make any changes needed, since this submitter left off the "maintainers can edit this PR" flag.

At this time I don't have any changes only a clean rebase on v1.22.2 version of Flux release and am pushing a new PR with head revision eebe032 so that it can be re-tested for FOSSA which failed with rate limiting errors.

I'm not stating any opinion or approval by reopening this PR, just trying to move things along (and also get it out of the queue if possible, since it seems to be stuck here.)

@kingdonb
Copy link
Member

Closing in favor of #3485, which is expected to be passing in CI

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to fetch tags from ECR
4 participants