Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for ALBs in multiple AWS accounts #3634

Open
marcosdiez opened this issue Apr 3, 2024 · 17 comments
Open

Support for ALBs in multiple AWS accounts #3634

marcosdiez opened this issue Apr 3, 2024 · 17 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@marcosdiez
Copy link
Contributor

marcosdiez commented Apr 3, 2024

We have multiple EKS clusters, in multiple VPCs. That being said, we plan to have a centralized VPC just for ingress, following this pattern: https://docs.aws.amazon.com/whitepapers/latest/building-scalable-secure-multi-vpc-network-infrastructure/using-network-firewall-for-centralized-ingress.html

Since less is more, we want to keep the number of hops as low as possible, so we plan to have the following traffic setup:


Browser  -> PUBLIC_ALB_IN_THE_INGRESS_ACCOUNT -> IP_OF_THE_EKS_POD_IN_THE_APPLICATION_ACCOUNT

That means we have at least two AWS accounts: INGRESS_ACCOUNT and APPLICATION_ACCOUNT

Our current setup is created with terraform, including the EKS cluster, the ALB, the DNS entry, the ALB rule and the ALB target group.
Which means the only thing that the aws-load-balancer-controller has to do is to sync the IPs of the k8s service with the target group.

For that, we use TargetGroupBindings ( https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.7/guide/targetgroupbinding/targetgroupbinding/ )

Just for reference, this is what a TargetGroupBinding look like:

apiVersion: elbv2.k8s.aws/v1beta1
kind: TargetGroupBinding
metadata:
  name: my-tgb
spec:
  serviceRef:
    name: awesome-service # route traffic to the awesome-service
    port: 80
  targetGroupARN: <arn-to-targetGroup>

So what we need is for the aws-load-balancer-controller to be able to interact with the target groups in different AWS accounts.

More specifically, we need this IAM permission on different AWS accounts:

                "elasticloadbalancing:DescribeTargetGroups",
                "elasticloadbalancing:DescribeTargetHealth",
                "elasticloadbalancing:ModifyTargetGroup",
                "elasticloadbalancing:ModifyTargetGroupAttributes",
                "elasticloadbalancing:RegisterTargets",
                "elasticloadbalancing:DeregisterTargets"

Ideally, we should be able to do that with AWS RAM, but AWS RAM can only share whole VPCs, which not something my security team is happy about.

So what I suggest we do in order to solve that problem is to patch the aws-load-balancer-controller to allow it to impersonate arbitrary IAM roles per TargetGroupBinding. This way we could solve not only my specific problem but more complex problems from others as well.

We could start by adding an alb.ingress.kubernetes.io/IamRoleArnToAssume annotation in the TargetGroupBinding. If that annotation is present, then the aws-load-balancer-controller would attempt to impersonate it before interacting (using the above IAM permissions) with that specific target group.

Such setup would be flexible enough to allow one aws-load-balancer-controller to manage all the target groups in the world, as long as there exists the right IAM roles for it to impersonate.

I am speaking with my employer to see if they would consider me (or somebody from my team) writing a pull request for that.
That being said, we are not in the aws-load-balancer-controller business hence it would be great to know if such a pull request would eventually be merged, so we don't have to maintain a fork in the long term.

Thank you

@shraddhabang
Copy link
Collaborator

@marcosdiez Thank you for reaching out to us and sending this detailed information of your architecture and proposed solution.
We will need to discuss this design proposal with our internal security team to figure out the security concerns. We will start that process.

@shraddhabang shraddhabang added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 4, 2024
@shraddhabang
Copy link
Collaborator

@marcosdiez We had a review for this internally with our security engineer. He is fine with the feature and proposed solution. He had one concern around the cross account IAM permissions which may result in confused deputy problem. You can prevent it by adding the "ExternalId" to the trust policy of the IAM role so that the permissions are scoped down to specific accounts only. Please consider this while implementing.

Please note that we will have a formal appsec review as well once the implementation is complete. As long as it passes the security review, we will merge your PR. Looking forward to it.

@marcosdiez
Copy link
Contributor Author

Thank you @shraddhabang . I will discuss with my team and hopefully send you some code in a week or two.

@imaharu
Copy link

imaharu commented May 7, 2024

Hi @marcosdiez
Thank you for great suggestion.

How is your situation for this issue?
We are also in the same problem for multi account, so we are looking forward to your PR.

Thanks.

@marcosdiez
Copy link
Contributor Author

Hi @imaharu . I actually only got to start it part time yesterday. Ask me again in a few weeks :)

@marcosdiez
Copy link
Contributor Author

@imaharu I published container with this feature working. You may check at #3691

@imaharu
Copy link

imaharu commented May 22, 2024

@marcosdiez
Thank you for creating great Pull Request.
That looks good.
Your code is easily to read. Our team learned a lot of things.

This motivates me, so I want to try oss contribution next time if i have a chance!

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 20, 2024
@marcosdiez
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 20, 2024
@matchan26
Copy link

@shraddhabang
Hello, is there any progress on this issue?
We are eagerly awaiting the implementation of this feature, as our product also has the issues mentioned in this issue.
If you don't mind, it would be very helpful if you could review and merge the relevant PRs. Thank you!

@listellm
Copy link

we are also super keen to use this feature. We heavily utilise AWS Accounts. We create an account per customer / purpose + environment. We have 3 environments (prod, stg and dev).

We then create one cluster account per environment purely for creatigng EKS clusters. We then create an EKS cluster per region we have presence.

So taking a boiled down version we would have:

  • cluster-account in region eu-central-1 with one eks cluster
  • customerA in region eu-central-1
  • customerB in region eu-central-1
  • etc

When creating the VPC in the cluster account, using AWS RAM we extend the private and public subnets from the VPC to all other workload accounts in that region+environment. So CustomerA and Customer B in region eu-central-1 for example will have public and private subnets present from the VPC created in cluster account for eu-central-1

With each environment, we have one EKS cluster per region. So taking for example eu-central-1 and dev, we would have one EKS cluster covering this combination.

Using Terraform, we then create the certs, DNS record, ALB, target groups, listeners in each customer account on the cluster-account VPC. All we then want to do within the EKS itself, using LB controller is use targetgroupbinding to bind the service(s) to the respective targetgroups we have created in each account.

@andpozo
Copy link

andpozo commented Nov 22, 2024

Hi everyone,
I came across this thread while looking for solutions and wanted to share my current approach. My solution can successfully register an Application Load Balancer (ALB) in another AWS account. However, it’s not fully working yet due to a hardcoded value in the controller that’s causing issues.

To clarify, the current implementation works if you want to register both the private and public ALBs in the ingress account. However, my approach requires registering only the public ALB in the ingress account and the private ALB in the non-ingress account. This specific requirement is where the issue arises.

I’ve submitted a feature request to address this limitation, and I believe its implementation could provide a proper solution. In the meantime, if anyone has encountered a similar scenario or has suggestions for a workaround, I’d love to hear your thoughts!

Thanks in advance!

@andpozo
Copy link

andpozo commented Nov 22, 2024

if anyone is working on a similar setup or facing related challenges, feel free to reach out to me for support. I’d be happy to share insights from my experience.

@listellm
Copy link

if anyone is working on a similar setup or facing related challenges, feel free to reach out to me for support. I’d be happy to share insights from my experience.

Can you let me know how, from the EKS Cluster Account (Account A), where the AWS LB Controller is running, you have used TargetGroupBinding to register the target with the Target Group in your ingress account (Account B)?

@andpozo
Copy link

andpozo commented Nov 22, 2024

if anyone is working on a similar setup or facing related challenges, feel free to reach out to me for support. I’d be happy to share insights from my experience.

Can you let me know how, from the EKS Cluster Account (Account A), where the AWS LB Controller is running, you have used TargetGroupBinding to register the target with the Target Group in your ingress account (Account B)?

Hey I'm using OIDC setup in account B referencing to the OIDC from EKS in account A and then using IRSA to create a service account and give the ability to asume role, I don't need to create target group, as soon as I created the Service in account A, the ALB is provisioned in account B, the particularity is that tha targets arare registered using ip not instance

@andpozo
Copy link

andpozo commented Nov 22, 2024

if anyone is working on a similar setup or facing related challenges, feel free to reach out to me for support. I’d be happy to share insights from my experience.

Can you let me know how, from the EKS Cluster Account (Account A), where the AWS LB Controller is running, you have used TargetGroupBinding to register the target with the Target Group in your ingress account (Account B)?

Check this feature request #3946

@andpozo
Copy link

andpozo commented Nov 27, 2024

Hi @listellm , could you please push up #3946 for further visibility? doing so will help us get closer to resolving the issue. I hope my previous comment was helpful. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

8 participants