Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AWS pod identity support #499

Merged
merged 3 commits into from
Mar 4, 2020

Conversation

zach-dunton-sf
Copy link
Contributor

No description provided.

@zach-dunton-sf
Copy link
Contributor Author

docs are here kedacore/keda-docs#56

'kiam' -> 'aws-kiam'
'eks'  -> 'aws-eks'
@otterley
Copy link

otterley commented Jan 22, 2020

I'm a bit puzzled by the proposed implementation.

In an EKS-managed cluster, if the KEDA operator Pod is properly associated with a ServiceAccount, and the ServiceAccount has the proper eks.amazonaws.com/role-arn annotation, then the AWS Go SDK will do the right thing automatically if you use the default credential chain. A mutating webhook that EKS provides will automatically inject the necessary AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE environment variables in the pod, and the default credential chain knows how to look for them and what to do with them.

In other words, there are usually no code changes needed if the manifest is correctly set up and you haven't messed with the default credential chain. (Unfortunately, it looks like in the existing code, the credential chain is being overridden for reasons that aren't clear to me yet; we probably need to think about that.)

If I understand the proposal correctly (and I might not be!), this change would cause the KEDA operator to look at the annotations applied to the pod template of the target deployment and use that as a role name to assume. But I do not think this will work as designed. The IAM role is bound to the pod making the API calls -- only that pod can assume the IAM role. The annotation needs to be applied to the operator, not the target.

Let me know if you think I'm confused.

Also, out of curiosity, how did you test this change?

@zach-dunton-sf
Copy link
Contributor Author

You can run a single role and attach it to the operator that allows it to read the stats for all SQS queues.

A more granular approach would be to craft a role for your application that only allows "roleAppA" to read SQS queue information for only the SQS queues of "AppA". The IAM role "roleAppA" would have to have a trust policy that allows the service account of KEDA to assume it.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "$PROVIDER_ARN"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${ISSUER_HOSTPATH}:sub": "system:serviceaccount:default:keda-serviceaccount"
        }
      }
    }
  ]
}

I will investigate further into the eks implementation. I have not tested it fully.

@tomkerkhove
Copy link
Member

@zach-dunton-sf would it be possible to merge with master please?

I presume you are still waiting on a review? Care to pick this up @ahmelsayed ?

@ahmelsayed
Copy link
Contributor

Sorry for the delay @zach-dunton-sf, I'm not too familiar with how pod identity works on AWS so was just keeping an eye on the conversation here. The change looks fine from my prospective. I will merge it once the build passes

@ahmelsayed ahmelsayed merged commit 4180c15 into kedacore:master Mar 4, 2020
@ncrothe
Copy link

ncrothe commented Mar 10, 2020

Is this solution expected to also work for kube2iam which uses similar techniques to kiam?

@zach-dunton-sf
Copy link
Contributor Author

zach-dunton-sf commented Mar 10, 2020 via email

@SatishRanjan SatishRanjan mentioned this pull request Mar 13, 2020
@ncrothe
Copy link

ncrothe commented Mar 16, 2020

Tried it out and it does not seem to work for kube2iam (or I don't understand what I am supposed to do).
I have added a kube2iam role to the metrics-apiserver pod thus allowing the metrics server to get the correct credentials to fetch metrics from Cloudwatch. However, I am getting 403s. The following can be found in the logs of the apiserver:

AccessDenied: User: <MY_KUBE2IAM_PROVIDED_READER_ROLE> is not authorized to perform: sts:AssumeRole on resource: <ROLE_OF_THE_POD_TO_BE_SCALED>

I do not understand that. Why would it try to assume the role of the thing it is supposed to be scaling? That thing might not even have access to the metrics. What is the right way to use aws-kiam? The docs are, unfortunately, rather sparse on this.

@zach-dunton-sf
Copy link
Contributor Author

zach-dunton-sf commented Mar 16, 2020 via email

@ncrothe
Copy link

ncrothe commented Mar 16, 2020

I understand how to do the mechanical AWS parts, but I don't understand why I would want to set it up that way? ROLE_OF_THE_POD_TO_BE_SCALED is the role that pod needs to do it's work and that work has nothing to do with metrics. At most it might typically have some PutMetrics permissions, but I don't think it should, as a general rule, have GetMetrics permissions - that is not the task of the scaled pod and thus IMO it should not have the permission.

@ncrothe
Copy link

ncrothe commented Mar 16, 2020

(but thank you for clarifying your intention with the different roles)

@ncrothe
Copy link

ncrothe commented Mar 16, 2020

Briefly thinking out loud: What would the alternative be, though? I came in expecting to add the permissions to the apiserver, which I'm starting to realise would mean I need to add all relevant permissions for all possibly AWS-scaled things to that pod/role. Which I guess could be seen as excessive. OTOH, I really dislike having to assume the role of the pod-to-be-scaled. Can we maybe add an annotation to the ScaledObject or the TriggerAuthentication object instead and then make sure the role the apiserver runs with can assume that one? From a separation of concerns perspective that would be much cleaner IMO.

@zach-dunton-sf
Copy link
Contributor Author

zach-dunton-sf commented Mar 16, 2020 via email

@ncrothe
Copy link

ncrothe commented Mar 16, 2020

Hmm, ok. But if I understand correctly: If I want to use TriggerAuthentication (with or without roleARN), I will then need to hard-code AWS_ACCESS_KEY_ID etc. (which is a big no-no in our organization) or jump through some hoops to make dynamic secrets work with this possibly (which is what kube2iam and kiam were designed to avoid).

@zach-dunton-sf
Copy link
Contributor Author

zach-dunton-sf commented Mar 16, 2020 via email

@ncrothe
Copy link

ncrothe commented Mar 16, 2020

Could you please point in the direction of a working example then?

apiVersion: keda.k8s.io/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-trigger-auth-aws-credentials
  namespace: default
spec:
  awsRoleArn: <MY_METRICS_READER_ROLE>

Does result in errors like
{"level":"error","ts":1584369099.61145,"logger":"controller_scaledobject","msg":"Failed to create new HPA resource","Request.Namespace":"default","Request.Name":"aws-sqs-queue-scaledobject","HPA.Namespace":"default","HPA.Name":"keda-hpa","error":"error getting scaler for trigger #0: Error parsing Cloudwatch metadata: 'AWS_ACCESS_KEY_ID' doesn't exist in the deployment environment","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/kedacore/keda/pkg/controller/scaledobject.(*ReconcileScaledObject).reconcileDeploymentType\n\tkeda/pkg/controller/scaledobject/scaledobject_controller.go:229\ngithub.com/kedacore/keda/pkg/controller/scaledobject.(*ReconcileScaledObject).Reconcile\n\tkeda/pkg/controller/scaledobject/scaledobject_controller.go:155\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.2/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.2/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.2/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:88"}

@zach-dunton-sf
Copy link
Contributor Author

It's a bit weird because you have to go through a Secret. I do not think it is possible to set it directly on the TriggerAuthentication.

https://github.com/kedacore/keda-docs/blob/master/content/scalers/aws-sqs.md

apiVersion: keda.k8s.io/v1alpha1
kind: TriggerAuthentication
metadata:
  name: trigger-auth-aws-role
spec:
  secretTargetRef:
  - key: AWS_ROLE_ARN
    name: keda-aws-secrets
    parameter: awsRoleArn
---
apiVersion: v1
kind: Secret
metadata:
  name: keda-aws-secrets
data:
  AWS_ROLE_ARN: some role arn

@ncrothe
Copy link

ncrothe commented Mar 16, 2020

That did it! Thanks for the help, I got it running now. Looks like I completely misunderstood what podIdentity is about and plain kube2iam would've worked before.

To recap in case someone runs into this ticket again:

  • I have set up a kube2iam role on both operator (for setup, I guess) and metrics-apiserver that is allowed to assume the role configured in the TriggerAuthentication
  • TriggerAuthentication with awsRoleArn reference in secret as explained by Zach
  • no podIdentity involved in my use case anywhere

@tomkerkhove
Copy link
Member

TriggerAuthentication doesn't support hardcoded values indeed, for now.

Was everything clear or do we need to update the docs?

@ncrothe
Copy link

ncrothe commented Mar 16, 2020

I think a full example with the awsRoleArn could be useful. It is not obvious that that can e.g. pull default AWS authenticators and that it needs to go into the secret.

@tomkerkhove
Copy link
Member

Tracking this in kedacore/samples#10 but happy to link to a sample if you have one

@bmacauley
Copy link

bmacauley commented Mar 29, 2020

Having problems setting the SQS scaler up on a kubernetes cluster using kube2iam. We are using v 1.3.0.

Followed @ncrothe approach above...

  • kube2iam role on both operator (for setup, I guess) and metrics-apiserver that is allowed to assume the role configured in the TriggerAuthentication (the kube2iam annotation is being written to the wrong section in the charts)
  • TriggerAuthentication with awsRoleArn reference in secret as explained by Zach
  • no podIdentity involved in my use case anywhere
  • created a keda role(k8s-keda), which contains a trust relationship for the kube2iam role attached to the kubernetes worker node(same as the many other roles we have)

Get the following error in the keda-operator...
{"level":"debug","ts":1585482843.978178,"logger":"scalehandler","msg":"Error getting scale decision","ScaledObject.Namespace":"hydra-sbx1-mobilelogins", "ScaledObject.Name":"hydra-ingest-outbound-etl","ScaledObject.ScaleType":"deployment", "Error":"AccessDenied: User: arn:aws:sts::123456789123:assumed-role/k8s-keda/e7450c49-k8s-keda is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::123456789123:policy/k8s-keda\n\n\tstatus code: 403,

kube2iam provides the role for the pod(uses the annotation)...so session should be available inside pod?

Any thoughts?

Looking at the code...it seems to use a different technique to consume the kiam/kube2iam role compared to other tools eg externaldns
aws_sqs_queue_scaler.go.
aws_iam_authorization.go

externaldns. ...uses session.NewSessionWithOptions
external-dns/provider/aws.go

@ncrothe
Copy link

ncrothe commented Mar 29, 2020

I think: To make the current version of this work, you need one more indirection. As the error implies, your pod is assuming the intended role via kube2iam, but then keda tries to assume the same role as that role. So either:

  • you add the role as trust to itself or
  • you split this into two roles: one role A that the pod assumes via kube2iam and a role B that role A can assume (via Keda) that has the actual permissions.

@bmacauley
Copy link

@ncrothe
Good suggestions... I was thinking similar
I will try these out and give an update

@tomkerkhove
Copy link
Member

Would you be open to create a new issue for this to see what we need to fix and/or improve the docs?

@bmacauley
Copy link

@tomkerkhove
Yep...no problem

preflightsiren pushed a commit to preflightsiren/keda that referenced this pull request Nov 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants