Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acl: remove ResolveTokenToIdentity #12166

Merged
merged 5 commits into from
Feb 1, 2022
Merged

Conversation

dnephin
Copy link
Contributor

@dnephin dnephin commented Jan 22, 2022

Branched from #12165, continues the work described in #11690

By exposing the AccessorID from the primary ResolveToken method we can
remove this duplication.
@dnephin dnephin added theme/acls ACL and token generation theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization pr/no-changelog PR does not need a corresponding .changelog entry labels Jan 22, 2022
@github-actions github-actions bot removed the theme/acls ACL and token generation label Jan 22, 2022
@dnephin dnephin changed the title Dnephin/acl resolve token 2 acl: remove ResolveTokenToIdentity Jan 22, 2022
We already have an ACLResolveResult, so we can get the accessor ID from
it.
Comment on lines +162 to 163
// TODO: remove
func (s *Server) ResolveToken(token string) (acl.Authorizer, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done in #12167

Copy link
Contributor

@freddygv freddygv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good, my only concern is that now some of the calls to get the accessor ID are more expensive. Left some comments below, let me know what you think.

@@ -1538,16 +1539,13 @@ func (l *State) notifyIfAliased(serviceID structs.ServiceID) {
// critical purposes, such as logging. Therefore we interpret all errors as empty-string
// so we can safely log it without handling non-critical errors at the usage site.
func (l *State) aclAccessorID(secretID string) string {
ident, err := l.Delegate.ResolveTokenToIdentity(secretID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this doing significantly more work now? Maybe not for this PR, but could we avoid building the authorizer here? This call is fairly expensive from what I've seen in some CPU profiles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point!

We only call this method when an RPC call returns either:

  • acl.ErrNotFound
  • acl.ErrPermissionDenied

So it should be pretty rare, but we do have some options to make it less expensive.

I was going to suggest we could cache the AccessorID, but I noticed ACLResolver already caches the Authorizer. It seems like we cache up to 256 Authorizers (1024 on servers), so I wonder how likely it would be that we miss the cache.

We could restore the method to retrieve only the Identity from a token, but I'm not sure if these rare cases are enough to warrant that extra code.

We could also decide to add a new cache of secretID -> AccessorID that is local to only the agent/local.State.

I wonder if we could actually remove this call entirely soon. We are planning on adding the AccessorID to "permission denied" error. Presumably if the error is "acl not found", then there is no accessor ID to return (or if it's a policy that is not found, we could include the accessor ID in that error as well).

Given that we are planning on improve those errors in the next release, I wonder if we could leave this as-is, and remove this aclAccessorID method once we've enriched the errors on the server side. Does that sound like a good approach? I think that gives us better errors without having to do any kind of lookup for AccessorID.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems pretty reasonable; can we capture this work somewhere to make sure it isn't missed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! I opened #12233 to track this cleanup.

@@ -15,18 +15,15 @@ import (
// critical purposes, such as logging. Therefore we interpret all errors as empty-string
// so we can safely log it without handling non-critical errors at the usage site.
func (a *Agent) aclAccessorID(secretID string) string {
ident, err := a.delegate.ResolveTokenToIdentity(secretID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern here

Copy link
Contributor Author

@dnephin dnephin Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this method being called in two places: Agent.filterMembers, and Agent.sendCoordinate.

Agent.sendCoordinate is similar to the other case in agent/local. It's only done when we receive a permission denied error, and we will be included the accessorID in that error soon, so we should be able to remove that call entirely.

The second call Agent.filterMembers already has an acl.Authorizer. I missed this on the first pass! I'll push a commit which removes this call, and uses authz.AccessorID() to get the value. (edit: Done in the latest commit)


// If ACLs prevented any responses, error
if len(reply.Intentions) == 0 {
accessorID := s.aclAccessorID(args.Token)
accessorID := authz.AccessorID()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, love that this avoids re-resolving the token

Copy link
Contributor Author

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, good call! Most of the changes here reduce work, but there are 3 cases in agent (or agent/local) where we do more work. I'll comment inline on each of them. I suspect we'll be able to remove those calls soon.

@@ -1538,16 +1539,13 @@ func (l *State) notifyIfAliased(serviceID structs.ServiceID) {
// critical purposes, such as logging. Therefore we interpret all errors as empty-string
// so we can safely log it without handling non-critical errors at the usage site.
func (l *State) aclAccessorID(secretID string) string {
ident, err := l.Delegate.ResolveTokenToIdentity(secretID)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point!

We only call this method when an RPC call returns either:

  • acl.ErrNotFound
  • acl.ErrPermissionDenied

So it should be pretty rare, but we do have some options to make it less expensive.

I was going to suggest we could cache the AccessorID, but I noticed ACLResolver already caches the Authorizer. It seems like we cache up to 256 Authorizers (1024 on servers), so I wonder how likely it would be that we miss the cache.

We could restore the method to retrieve only the Identity from a token, but I'm not sure if these rare cases are enough to warrant that extra code.

We could also decide to add a new cache of secretID -> AccessorID that is local to only the agent/local.State.

I wonder if we could actually remove this call entirely soon. We are planning on adding the AccessorID to "permission denied" error. Presumably if the error is "acl not found", then there is no accessor ID to return (or if it's a policy that is not found, we could include the accessor ID in that error as well).

Given that we are planning on improve those errors in the next release, I wonder if we could leave this as-is, and remove this aclAccessorID method once we've enriched the errors on the server side. Does that sound like a good approach? I think that gives us better errors without having to do any kind of lookup for AccessorID.

@@ -15,18 +15,15 @@ import (
// critical purposes, such as logging. Therefore we interpret all errors as empty-string
// so we can safely log it without handling non-critical errors at the usage site.
func (a *Agent) aclAccessorID(secretID string) string {
ident, err := a.delegate.ResolveTokenToIdentity(secretID)
Copy link
Contributor Author

@dnephin dnephin Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this method being called in two places: Agent.filterMembers, and Agent.sendCoordinate.

Agent.sendCoordinate is similar to the other case in agent/local. It's only done when we receive a permission denied error, and we will be included the accessorID in that error soon, so we should be able to remove that call entirely.

The second call Agent.filterMembers already has an acl.Authorizer. I missed this on the first pass! I'll push a commit which removes this call, and uses authz.AccessorID() to get the value. (edit: Done in the latest commit)

I missed this on the first pass, we no longer need to look up this ID, because we have it
from the Authorizer.
@vercel vercel bot temporarily deployed to Preview – consul January 26, 2022 22:22 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging January 26, 2022 22:22 Inactive
@markan markan self-assigned this Jan 28, 2022
Base automatically changed from dnephin/acl-resolve-token to main January 31, 2022 18:27
}

defer metrics.MeasureSince([]string{"acl", "ResolveTokenToIdentity"}, time.Now())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed I am removing this metric. I'm going to push one more commit that documents that this metric is being removed. I'll add a changelog as well.

@dnephin dnephin removed the pr/no-changelog PR does not need a corresponding .changelog entry label Jan 31, 2022
@dnephin dnephin requested a review from a team as a code owner January 31, 2022 19:22
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging January 31, 2022 19:22 Inactive
Also document the metric that was removed in a previous commit.
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging January 31, 2022 22:54 Inactive
@dnephin dnephin merged commit 997bf1e into main Feb 1, 2022
@dnephin dnephin deleted the dnephin/acl-resolve-token-2 branch February 1, 2022 00:19
@hc-github-team-consul-core
Copy link
Contributor

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/571786.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/no-metrics-test theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants