-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix several ACL token/policy resolution issues. #5246
Conversation
1 - Use the right method to fire async not found errors when the ACL.PolicyResolve RPC returns that error. This was previously accidentally firing a token result instead of a policy result which would have effectively done nothing (unless there happened to be a token with a secret id == the policy id being resolved. 2. When concurrent policy resolution is being done we single flight the requests. The bug before was that for the policy resolution that was going to piggy back on anothers RPC results it wasn’t waiting long enough for the results to come back due to looping with the wrong variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow.
To the extent that I understand this it seems good to me. Your description makes it clear what the issue is and the code looks plausibly like it solves it and the test verifies that. That said, it's all subtle enough that I can't say with high-degree fo confidence that this is now fully correct in all possible concurrent executions!
If you are happy then I think it's fine as it is. If you feel like there is risk of introducing a worse CVE (i.e. fail open/use wrong policy) than the current fail-closed bug, we can think about other ways we could build confidence in this case.
Co-Authored-By: mkeeler <mkeeler@users.noreply.github.com>
@banks The only potential for granting more access than desired is really an issue with caching tokens at all. If for example you have two tokens that link the same policies and you resolve them both only 1 of them is going to be used to resolve policies. If between the original identity fetch/cache the second token gets modified to remove one of those policies the new state will not be detected until the identity cache expires. I don't see this as an issue. So long as we cache tokens/identities and policies we will have to deal with stale data which is what this scenario is. Also this isn't new with this code or even with the new ACL system. I am confident that this introduces no risk of failing open. In order to be granted access you still have to have a policy that grants you that access. This PR just helps to ensure that we don't fail unnecessarily to resolve the policy causing denial of access erroneously. |
Gotcha sounds reasonable to me!
…On Tue, Jan 22, 2019 at 4:40 PM Matt Keeler ***@***.***> wrote:
@banks <https://github.com/banks> The only potential for granting more
access than desired is really an issue with caching tokens at all.
If for example you have two tokens that link the same policies and you
resolve them both only 1 of them is going to be used to resolve policies.
If between the original identity fetch/cache the second token gets modified
to remove one of those policies the new state will not be detected until
the identity cache expires. I don't see this as an issue. So long as we
cache tokens/identities and policies we will have to deal with stale data
which is what this scenario is.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5246 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAHYU5TZ43HQxMZ1D8KFlvEqlOVmzfcWks5vFz8PgaJpZM4aM3ny>
.
|
Fixes #5219
The main issue was that token specific issues (not able to access a particular policy or the token being deleted after initial fetching) were poisoning the policy cache.
A second issue was that for concurrent token resolutions, the first resolution to get started would go fetch all the policies. If before the policies were retrieved a second resolution request came in, the new request would register watchers for those policies but then never block waiting for them to complete. This resulted in using the default policy when it shouldn't have.
This PR fixes those issues. The second issue was simplest to fix. We were using the wrong value to determine the number of times to block on our wait channel for async policy resolution responses. Previously we looped newAsyncFetchIDs number of times which when other policy resolutions for required policies are already ongoing will be less than the total number of policies needed. This was changed to the number of policies that need to be fetched (regardless of which resolution request is the one to initiate the policy fetch)
The first issue required a few changes.
On the server side, the
ACL.PolicyResolve
endpoint was modified to return permission denied errors when trying to resolve a policy that the token is not linked with. This way we can differentiate between non-existent policies and policies the token just isn't allowed to retrieve.On the client side token/policy got a little more complex. Token errors during policy resolution (both not found and permission denied) are now handled better. First they no longer modify the policy cache for these errors. Instead for a not found error, it overrides the identity used for the request in the local cache to nullify it and store the not found error. For a permission denied errors, the cached identity is just removed which will allow subsequent requests to be fetched again. For both token related errors it will cause policy resolution to stop. For permission denied errors and not found errors for other tokens, the request will be retried.
For example:
Token A is linked to policies 1, 2 and 3
Token B is linked to policies 2, 3 and 4
Then the following flow is how things will work: