-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Observing flakiness with oidc-token using identity.entity.aliases.<mount accessor>.metadata.<metadata key> #11798
Comments
Poller-A and Poller-B have the same role-id. They only differ in their secrets with each secret-id having its own metadata |
Reading the code, one thing that struck me was given that all our pollers use the same approle { and hence the same entity }, could it be possible that the entity that is retrieved via This is just me speculating based on reading through the code. Looking forward to your responses. |
The golang based test further lends credence to the above theory. I basically ran 2 go routines one running poller-A and the other running Poller-B. The test is super simple, run 2 go routines that implement the above steps
Instead of seeing 2 distinct resource_ids we just see 1 most of the time in our stdout logs. At this point, I'm 70% sure this is the bug. Could you please confirm the same? We thought we get by with using 1 approle and 1 secret-id per poller. But looks like we need 1 approle per poller. |
I'm encountering the same bug in a different scenario, where a single approle role_id is used to create secret_ids for different hosts, to run a configuration management tool on each host reading secrets from vault using a token with its own secret_id. To restrict reading of certain paths in vault, we stamp metadata identifying the host on each secret_id, and we assign the role_id to a policy where this metadata is used. When running our configuration management tool on multiple servers at the same time, it seems that these policies are not correctly applied. The following steps reproduce this error:
|
Checking in to see about this and #11803 |
Hi all! Thanks for submitting this issue, and figuring out the root cause for it! Also as a question for the people affected by this problem, is there any specific reason to not use multiple approles in this scenario, instead of 1 approle and multiple secrets ? I understand management overhead is one reason to not, but interested to hear if there are any others. |
If #10682 is merged, then the secret-id metadata could simply be added to the token metadata and then used in identity templating via that mechanism. Single approle multiple secret ids makes sense logically as a easy way to group N servers in X role. This is especially useful for clustered services where you may have a lot of instances of what's effectively the same server. Allowing metadata to be defined per secret id allows defining the hostname of the box as metadata, which then can be used in eg the PKI secrets engine to allow that server to get a certificate for its specific hostname. A single approle for this sort of use-case also means provisioning of policies a role has access to (which changes relatively infrequently and is a very privileged operation) and where it can be used (I use CIDRs on the approle to limit eg production approle tokens to only be able to be used by things on the production network range) can be separated from provisioning secret-ids which by necessity happens a lot more often (ie, whenever spinning up a new instance of a server and is done by more automated tools like Terraform/Atlantis/Jenkins/etc). The capability to use these sorts of safeguards as well as separation of privilege (between BAU scaling up and down of services, vs changing privileges that a system has access to) are very important in most organisations. I do like the idea of not having to have an individual entity for each server (which would be the case having multiple approles or what the PR mentioned earlier would have caused) since this would cause a lot of entities to get created that are mostly meaningless and are not automatically cleaned up if eg the approle or secretid are deleted (eg when destroying the specific server). For more static infrastructure it's less big of a deal, but when you're running immutable infrastructure by necessity you're spinning up and down VMs a lot, so management overhead and safeguards matter a lot. |
Hi @nvx ! Thanks for the detailed explanation of use cases, its super valuable as we figure out how to solve for this! |
Identity templating can be used in other places than request URLs, but the case that stands out for me where you would want it in a request URL would be when storing per-server secrets in Vault (often clustered servers have the same sets of secrets across all servers, but some things for eg sidecar services that run on the host for other management purposes/etc than the actual clustered application itself) - I personally use this pattern to store secrets in the KVv2 secrets engine with the hostname of the server in the path go figure. |
Aha sidecar services is an interesting pattern, and one I didn't think about, I can see how secretid metadata templating would be useful to solve problems with that, will be sure to keep this in mind as we think about this! |
@pmmukh - any chance there have been any updates related to this? If this won't be supported, please adjust the documentation to indicate that either an AppRole role should only be used by consumers that require exactly the same permissions or that the secret-id must always be generated with the same metadata for a given role. If this will be the case, it may just make more sense to move the metadata to the role to avoid any further confusion. |
I continued my work and effort on merging PR #10682 but my use-case is mainly with OIDC/JWT backends but I guess that anything that supports mapping metadata on the token could be used (I'm just not too familiar yet with other auth backends, sorry) |
Hi folks, is this still an issue with newer versions of Vault? Please let me know and I can bubble it up accordingly. Thanks! |
@heatherezell I ran into this issue as well - it is not clear from the documentation that the In my opinion there are 2 ways to take this issue:
|
I also just stumbled into this, via #12797 - basically got the same scenario, and assumed I could create one approle, then add multiple secrets with different metadata and use templated policies to restrict access for each secret (which represents different machines/deployments in my case). Only after running into errors I noticed via token lookups that all of them share the same entity ID and alias, completely breaking the expectation of being able to use the secret's metadata. Not sure how to workaround this without creating one approle per deployment instead? which would lead to a lot of approles, which doesn't seem like it's the intended way either? I'd expect to simply be able to use secret/token metadata in the policies if one can assign it. |
Describe the bug
Sometimes custom claims populated via
identity.entity.aliases.<app_role_mount accessor>.metadata.<metadata key>
, wheremetadata
is stamped on thesecret-id
of the aprole, don't get populated. There is no pattern on when this happens. Sometimes it happens on a successive curl request to the token endpoint exposed by the oidc-role using the same vault token that returned an oidc token with all claims correctly set. Sometimes, it happens on a brand new token obtained by logging in using approle creds. Once it happens, successive iterations do not populate those claims at all unless a new vault token is obtained.To Reproduce
Cannot describe a deterministic way to repro as this problem doesn't appear to occur in a deterministic manner. Instead will try our best to describe our setup
Overview
We have a custom oidc-token template to identify our agent pollers looking for work from a central server. All pollers use the same approle but each poller has its own secret-id. This secret-id is stamped with a meta data for uniquely identifying the poller. For ex. Poller-A could have poller-a.json as its secret's meta data
and its secret created and propagated via
curl --header "X-Vault-Token: ..." --request POST --data @poller-a.json https://my-vault-endpoint/my-approle-role-url/secret-id
. Similarly, we have a poller-b, whose metadata could look likeand its secret-id could have been generated by
curl --header "X-Vault-Token: ..." --request POST --data @poller-b.json https://my-vault-endpoint/my-approle-role-url/secret-id
Details
vault write <my-app-role-path> role_id=... secret_id=...
token_meta_resource_id <my-resource-id>
curl --header "X-Vault-Token: ..." https://<my-vault-endpoint>/v1/identity/oidc/token/<my-role>
<My role>
is defined in tf such that there is one per approle mountpoint and we have one approle mountpoint per env stages. In broad strokes,jwt decode <my-token>
Repeat the above test in various flavors
curl
for fetching the oidc-token and thejwt decode
for the same token which had previously returned a valid resource_idAdditionally, performed this test as a golang program which runs all steps above using jose library for decoding the oidc and creating vault clients from scratch every 30 seconds.
The manual tests showed quite a few instances of resource_id not getting populated and there was no pattern to when it occured. Sometimes it would occur for a token for which resource_id previously got populated. Sometimes it occurred for a brand new token
golang tests are run for just one of the pollers in tight loop and every 30 seconds. When run on a tight loop, we saw 3 failures out of 4000 odd. 30 second interval test is ongoing. No failures so far after 36 iterations.
Expected behavior
Expect the
resource_id
to be set every timeEnvironment:
vault status
):vault version
):The text was updated successfully, but these errors were encountered: