-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for EC2 IAM roles in a way that allows us to safely share version history #3023
Comments
@jduv Hey! I'd be especially interested in your thoughts on this as you submitted both concourse/semver-resource#85 and concourse/s3-resource#115 for enabling IAM roles in two of our resource types (thanks! 🙂). Unfortunately I'm not comfortable with merging either until we have a plan for the security challenge described in this issue. Supporting IAM roles directly from resources has been a controversial topic in the past, but I understand that there's a big need for it, so it'd be great if we could find common ground (either with this proposal, if it even works, or some other approach). @tlwr I noticed in #2951 that you use team-scoped workers so that you can safely use IAM roles, so this may be of interest. Would the proposed solution make sense for you? |
Looks like EC2 only allows one IAM role per instance, which kind of ruins this idea as you would then only be able to have one role for all teams. At least with per-team workers with their own IAM roles, you could have one per team, so this isn't as flexible. Hmm... |
Maybe what we're really missing here is a credential manager for generating these temporary credentials based on configured policies. Similar to what could be done with Vault's AWS backend: https://www.vaultproject.io/docs/secrets/aws/index.html ... only without requiring them to deploy and maintain a Vault instance. |
BackgroundOur relationship with concourse has its genesis from GOV.UK PaaS which is a Cloud Foundry deployment for UK government services, it is specifically Cloud Foundry running on AWS. Our multitenant concourse is separate from the PaaS project and is internal to our org. As such we recommend to our users that they should use forked resources, for instance: These forks have support for using IAM instance profiles. Sharing version historyI don't think I have enough context to appreciate why sharing version history is useful (other than general efficiency). Although I can imagine it would be useful for Wings or other large Concourse deployments where the teams are less isolated from each other. For our use-case we will not have a worker which is not allocated to a team. Antipatterns
As an org we have guidance that is we should not have things that run untrusted code (i.e. not on Doing something like this:
and then
as you suggested would work. ATC can act as some form of IAM proxy but it would have to be implemented through the EC2 node (or ECS task) assuming roles; i.e.
The impact of this is that the node running ATC needs to be able to assume all of the tenant IAM roles, although this is no big deal. The resource also has to include the session token:
Contributor burdenThe above suggestion is perhaps too AWS specific but would make it easier for people writing resources and does solve the problem at scale. If this is a feature/pattern/mechanism that would be appreciated I would be happy to contribute towards it, although my knowledge and context is mainly AWS specific and I am not that up to speed with the codebase other than the minimum required to write alphagov/terraform-provider-concourse. Obviously it is onerous for resources to have to implement the wheel for each cloud provider. I would suggest Having these credentials injected (as you described - with my modification to include the session token) would make it easy to write resources which interact with AWS services, but also for tenants writing pipelines. Being able to automatically generate a set of STS credentials within a pipeline (that works for each cloud provider) would be quite helpful within tasks as well as resources. |
@vito K8s faced a similar issue, check out https://github.com/jtblin/kube2iam#readme |
Also check out https://github.com/uswitch/kiam |
@tlwr Thanks! That makes sense - so it sounds like this would be an STS credential manager backend of some sort. @analytically Thanks for the pointers! I think we might be able to just prevent resource containers from reaching any local network addresses once we have a solution here and have resources consistently just using their JSON input as the source of truth. That would be a separate challenge to tackle later though since I'm sure there are resources which require it at the moment. |
It would be nice to segregate the IAM role of the worker instance from the IAM permissions which the containers run in. This would, for people running concourse with VM multitenancy, allow the convenience of IAM roles instead of having to do secrets management manually. I.e. when configuring concourse you set (as above) This approach does bring with it the can-of-worms of STS token durations - i.e. how long to generate a token for, and how to configure this - if possible? As Mathias mentioned above this is basically kube2iam but for Concourse. Would it sensible to split this out into two separate issues:
|
@tlwr A kube2iam style solution won't solve the problem of the So far I'm still kinda sold on the idea of an STS credential manager backend. It's a bit of work upfront but it doesn't feel like that much more work compared to configuring team workers with IAM roles.
Not sure what you mean - #2951 will be part of 5.0 and should preserve existing functionality. |
Good to know about #2951 I wasn't sure if it was going to be part of 5.0, but now double checking it is part of the 5.0 milestone 😅 STS credential manager backend would be excellent, not sure what the API would be like on the pipeline level (I'm not sure about other cloud providers and am unsure if they have a similar API to AWS STS) but it would make writing resources much easier. Also having STS credentials with the |
Phew, just checking as that's the only one that sprung to mind. 🙂
Hmm, I suppose the simplest thing would be Now that I think of it, though, this would at minimum require Concourse only fetches the source:
access_key_id: ((role_name.access_key_id))
secret_access_key: ((role_name.secret_access_key))
session_token: ((role_name.session_token)) As of today, that will actually fetch
Agreed! 🙌 |
Just my two cents about supporting IAM Role: This feature request is not about couple services into any vendor's infrastructure, it is about following security best practice when you do support a new resource no matter if it is from Cloud Foundry or AWS or others. Concourse's Credential management backed by Amazon SSM or Amazon Secrets Manager has implemented support for IAM role and I think the default behavior of using IAM role if no access keys is set is great. A quick intro for anyone that is not familiar with AWS security. Using access key and secrets for a server to access AWS resources is a huge red flag when you are being reviewed by cyber security. The reason is that access key and secret pair is designed for human users for making calls using their local machine not for production servers. |
Yeah there are a few tensions:
Although
seems like a good solution to all three, at the expense of some Cloud provider specific implementations within Concourse The easiest implementation of above would probably use the IAM role of the |
I don't believe this is entirely correct. The credentials you get from the host is really just a call to the special local metadata service (169.etc.etc) which results in an aws access key id and secret access key. So machines absolutely should and do use aws AKID/SAK in production. I agree though that when possible, they should be temporary and come from a source that can rotate them.
Right now you can somewhat dynamically control which teams can access which vault params through vault itself. Once the credential management setup is done, I control team param access by storing it in appropriate paths, e.g.:
which, while not ideal because I have to copy the same secret to multiple paths if I want that same secret to be shared by some teams but not others... does still mean I can change which teams can use which params purely through vault configuration and without any configuration of the web server If you have mapping of roles to teams in the static config, now you have to update its config and restart the web server to change that, which seems like a regression in terms of how I would want to use a credential manager. While people are definitely using IAM roles on the host today to break some concourse best practices, alignment on the right way to get akid/sak into resources/tasks would be cool. If the web server is going to possibly talk to STS or grab credentials from roles to pass to workers, it's going to need the credentials to do so (a role which has a bunch of AssumeRole permissions). Ideally web itself would support getting its credential from a host role, so that I don't have to configure a static credential in the web process' environment. Similar to how web supports approle with vault today, ideally concourse will manage its authentication with the cred provider (vault/sts) by rotating its underlying credential automatically. |
For those following along at home, I've proposed concourse/rfcs#21 which hopes to lay the groundwork for this, but I do need to get more hands-on with IAM/STS in particular and try and crank out an example to see if team vs. global configuration is something that works well, particular in regard to EC2 IAM roles. |
For our Concourse we use a lambda function to dump temporary STS credentials for each team into Secrets Manager. Works nicely for STS credentials, but it would of course be better if this was baked into Concourse 😄 One thing I'm a bit wary of is adding more privileges to the ATC's instance profile, since the same process/instance is also hosting the web interface and is exposed to public/end-user traffic. Would it be more secure if the ATC (scheduler, secrets managers etc) was running on a separate instance from the web interface? |
I think credentials you get from the host through the local metadata service are temporary and automatically rotated. See |
Here are my current thoughts on this: If we allow prototypes:
# pull in prototype which supports `read` action
- name: ec2-iam-role
type: registry-image
source:
repository: some-generous-soul/ec2-iam-role-prototype
var_sources:
# define a var source which will run on workers which have EC2 IAM role access
- name: worker-iam
type: ec2-iam-role
# tags: [...] if needed
resources:
- name: some-artifact
type: s3
source:
bucket: some-artifacts
regexp: artifact-(.*).tgz
access_key_id: ((worker-iam:s3access.access_key_id)) # reads credentials using worker-iam var source
secret_access_key: ((worker-iam:s3access.secret_access_key))
session_token: ((worker-iam:s3access.session_token)) This fixes the biggest problem with the proposal as it was before, because now the EC2 IAM role configuration is set up on the workers - which can be per-team - rather than globally on the This idea would require us to have a secure method for Prototypes to return credentials to Concourse - so we may have to amend the RFC to not use files on disk, and instead use a secured protocol. |
Follow-up: a few days ago I pushed a revision to the Prototypes interface that - I hope - will allow it to be safely used for credential management. The idea is to just encrypt sensitive information in the response using a key provided in the request. The Encryption section goes over the mechanics. Looking forward to feedback on this approach! If it sounds good I think we're pretty much unblocked, and just need to get all of these RFCs merged. 🙂 |
Hi all. I'm wondering how IAM Roles for Service Accounts might fit into this discussion: https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/? Curious if you all have any thoughts. Thats currently how we have AWS configured to talk to the credential manager (AWS Secrets Manager). |
Concourse do not allow using instance IAM roles because it conflicts with their multi-tenancy designs[1]. Multiple "teams" can use the same instance, and using IAM instance roles means that this could be considered insecure. This is not applicable to our use case, so we are fine to use it. This removes the requirement to pass in access keys and just assume we want to use instance roles instead. [1] concourse/concourse#3023
I too am wondering about the IRSA integration. I have raised it here:#8716 but it would be great to get this as a feature. |
I'm not sure if this helps at all (I haven't dug into the complete thread above), but we routinely assume other roles in containers to do work like taking DB snapshots in RDS pre-migration. It's done via the CLI as so:
As such, I'm sure there are APIs that would allow you to do this in code and utilize the AWS auth toolchain. We do this for other AWS accounts and our own accounts. Perhaps you pass the role you want to use to the resource and it's up to the caller to ensure it's well formed? I'm also very versed in IAM roles and temporary security tokens in AWS, so happy to provide any context I can. |
Prerequisites:
What challenge are you facing?
With #2386 we started on a bunch of work to reduce the footprint of resources by sharing
check
containers and version history globally for equivalent resource definitions.Resource definitions are considered equivalent if their
source:
(interpolated with credentials and hashed for safety) and type are equivalent.However, as was uncovered in #3002, there is one situation where the hashed interpolated source is not enough to determine whether version history should be shared: resources using IAM roles. These resources forego putting credentials in
source:
in favor of using EC2-configured IAM roles to grant anything that runs on the workers access to the AWS resource automatically.This has pretty scary implications on version history sharing. Because the
source:
does not contain the credentials, all it would take is one person that does have access (via their own workers) to successfully check, and then anyone else could configure the samesource:
and see the same version history without even having to configure the IAM roles.Thankfully, they at least won't have access to the fetched bits. The
get
step would have to run on their own workers, which wouldn't have access, and so there would be no cache to re-use and no ability to fetch the bits. However, this is still a dangerous information leak.What would make this better?
I'm not sure! But the way resources use IAM roles today flies in the face of a couple Concourse anti-patterns:
anti/worker-state
: because the operator is configuring the workers specifically for particular workloads. This could backfire if they start to use those same workers to run un-trusted code like pull requests. Running on a worker should not automatically grant access to sensitive data!anti/multi-source-of-truth
with a hint ofanti/contributor-burden
: each resource that deals with AWS now needs to support two ways of being configured: IAM roles and static configuration.Is there some way we can make this relationship with IAM roles more explicit?
For example, instead of configuring the workers, could the operator configure the ATC with named IAM roles and explicitly permit certain teams to access those IAM roles, perhaps leveraging our existing credential manager support?
To be honest, my understanding of IAM roles is fairly loose, but as long as they can be named, we should be able to support this even in a multi-tenant environment, by configuring the ATC with something like this:
Then, assuming that the operator has already configured the
web
EC2 to have those named roles, pipelines belonging toteam-name
could use this credential manager like so:This credential manager would interpolate by fetching from the named role.
Assuming this works (big assumption as I have no experience with IAM roles), this would fix both anti-patterns:
anti/worker-state
because all configuration is centralized to theweb
node and explicitly managed by the ATC.anti/multi-source-of-truth
because the resource only has to care about statically configured credentials, just as it does today.One obvious downside is that Concourse currently only supports configuring one credential manager at a time. But that is probably something to discuss in concourse/rfcs#5.
The text was updated successfully, but these errors were encountered: