Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful secret removal #1715

Merged
merged 2 commits into from
Nov 4, 2016
Merged

Conversation

diogomonica
Copy link
Contributor

@diogomonica diogomonica commented Oct 29, 2016

Added best-effort verification of secrets in use before Removal, unless Force is specified.

/cc @cyli @aaronlehmann

➜ swarmctl secret rm diogo1.txt
Error: secret 'diogo1.txt' is in use by the following service: elnerd6rd37mf1su17xp120x0, r07u2ung4h3mt41myom3exews, xk6cohgr2yc54q70rwgh5rmwx

@diogomonica diogomonica force-pushed the graceful-secret-removal branch 3 times, most recently from b6b11f9 to aa02d07 Compare October 29, 2016 01:25
@codecov-io
Copy link

codecov-io commented Oct 29, 2016

Current coverage is 55.76% (diff: 75.00%)

Merging #1715 into master will decrease coverage by 0.14%

@@             master      #1715   diff @@
==========================================
  Files            96         96          
  Lines         15005      15020    +15   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits           8390       8376    -14   
- Misses         5504       5522    +18   
- Partials       1111       1122    +11   

Sunburst

Powered by Codecov. Last update be97f7f...f67b6d7

@@ -31,6 +31,34 @@ func secretFromSecretSpec(spec *api.SecretSpec) *api.Secret {
}
}

// checkSecretInUse does a best effort to find if the passed in secret is in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a best effort check"

@aaronlehmann
Copy link
Collaborator

LGTM

@aluzzardi
Copy link
Member

LGTM

I'm not 100% convinced by the force flag - couldn't we just ask the user to delete the service which is referring to this secret?

err error
)

s.store.View(func(tx store.ReadTx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is racy: by the time it returns true, the result may not actually be true and vice versa. The transaction needs to be held for the duration of the action take based upon the result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how all of our checks work, and why there is a description that this is "best-effort"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope not. This isn't really best-effort. The actual result is undefined. This is particularly pathological under deletes, as we could be deleting a secret that is actually referenced.

http://stackoverflow.com/a/34550/253486 explains this in detail.

Copy link
Contributor Author

@diogomonica diogomonica Nov 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the stackoverflow link, but I understand both the concept of a race-condition and the implications.

The current code does delete secrets that are actually referenced, so by adding this code, we're doing a best-effort attempt at checking conflicts.

@diogomonica
Copy link
Contributor Author

diogomonica commented Nov 1, 2016

@aluzzardi @aaronlehmann talking to @ehazlett seems like he doesn't see the need for a --force and is ok with forcing people to delete secrets from individual services.

Given this, I'm going to make the checkSecretsInUse loop return all of the conflicts instead of the first one, but my question is: do we leave force in as part of the API?

I wanted to allow DDC to force remove a secret, which was a feature that was asked by a customer, but I guess they can use annotations to simulate the feature?

/cc @jlhawn

@jlhawn
Copy link
Contributor

jlhawn commented Nov 1, 2016

Sorry, I don't know what exactly force does. Recursively delete the service(s)!? Does it leave secret references dangling? Atomically update service specs to remove references to the secrets (this is what I would expect)?

Sure it would be nice if force did these service updates automatically, but it also feels like a very overloaded operation.

I'm going to make the checkSecretsInUse loop return all of the conflicts instead of the first one

I'm 👍 on having the error message contain this list of conflicts. Given that, one could write a simple script which loops over the conflicting services and updates each to remove the secret reference and try to delete the secret again. This does leave a window where another service could be created/updated to use the secret, but if this happens unexpectedly you'd just get another conflict message when you retry, and you should probably discuss it with whoever else is operating on the same resources in the cluster at the same time as you.

@diogomonica
Copy link
Contributor Author

@jlhawn should have been more explicit, my bad.

Without force: DELETE operation errors with a list of services that are currently using the secret
With force: DELETE succeeds unless the secret does not exist, leaving dangling references.

@jlhawn
Copy link
Contributor

jlhawn commented Nov 1, 2016

leaving dangling references

What side effects does this have? If a service task needs to be rescheduled (before the service spec is updated to remove the secret reference) will it fail to be provided that secret value or does it somehow use the last known value?

@aluzzardi
Copy link
Member

I think --force doesn't belong in the API.

Dangling references will lead to gradual swarm degradation, which is not a great way to die.

If we do need force, I'd rather see this implemented either in the CLI or in UCP. The implementation would update all referenced services and just remove the reference, then trigger a secret removal.

@aluzzardi
Copy link
Member

This could be race-free only if we had to reverse lookup services by secret ID.

@aaronlehmann I believe that's impossible with the current store, right?

@aaronlehmann
Copy link
Collaborator

I believe that's impossible with the current store, right?

Yes.

@diogomonica
Copy link
Contributor Author

@aluzzardi a client-side remove that updates (potentially dozens) of services simultaneously seems like something we shouldn't implement since it will cause the forceful rolling update of all of them.

@aluzzardi
Copy link
Member

@diogomonica That seems like a walk in the park compared to dangling secret references IMO :)

@diogomonica diogomonica force-pushed the graceful-secret-removal branch 3 times, most recently from 074fdaf to 044a71f Compare November 2, 2016 00:27
services = append(services, k)
}

return grpc.Errorf(codes.DataLoss, "secret '%s' is in use by the following service: %v", secretName, strings.Join(services, ", "))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronlehmann I'm returning a codes.DataLoss here. I thought it was pretty adequate, but I'm fine with changing it. Just FYI.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think DataLoss is correct.

DataLoss indicates unrecoverable data loss or corruption.

Also, what I had in mind here was to say "service" if len(services) == 1 and "services" otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds a bit more code just for a little bit of syntax sugar, but changed in both PRs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt a little silly asking for it, but I know otherwise someone would complain about it after the fact.

@diogomonica diogomonica force-pushed the graceful-secret-removal branch 2 times, most recently from e9a78a2 to 2411b08 Compare November 2, 2016 00:46

// removing a secret that exists but is in use fails
_, err = ts.Client.RemoveSecret(context.Background(), &api.RemoveSecretRequest{SecretID: resp.Secret.ID})
assert.Equal(t, codes.DataLoss, grpc.Code(err), grpc.ErrorDesc(err))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is still checking for DataLoss.

@diogomonica diogomonica force-pushed the graceful-secret-removal branch from 2411b08 to 1611080 Compare November 2, 2016 00:58
serviceStr = "service"
}

return grpc.Errorf(codes.InvalidArgument, "secret '%s' is in use by the following %s: %v", serviceStr, secretName, strings.Join(services, ", "))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the wrong order of arguments.

@diogomonica diogomonica force-pushed the graceful-secret-removal branch from 1611080 to 78ecc87 Compare November 2, 2016 01:02
@aluzzardi
Copy link
Member

Thanks for the update, @diogomonica

The pattern you're using (View(check references) ; Update(delete secret)) is the one we've been using all over (e.g. networks) so it makes sense to follow the same one here for consistency and the following comment has nothing to do with this PR, it's more a SwarmKit general question:

What is worse, risking inconsistencies or holding down the update lock for too long?

There are two possibilities:

  1. Keep as is (fast but maybe inconsistent)
  2. Move the checkSecretInUse call inside the store.Update callback (costly linear scan of services but consistent)

The perfect (and long term) solution is making checkSecretInUse run in constant time (by having a service by secret ID index) and have it run in store.Update (so it's fast and consistent), but that's not going to happen soon enough.

@aaronlehmann You know the implications of holding the store lock for too long, what do you think?

@diogomonica
Copy link
Contributor Author

@aluzzardi I used pattern 2 in #1725, but for delete, iterating through all the services while locked might be a bit too much (chatting with @aaronlehmann he thought so too).

I'm fine with either, I don't care either way.

@aaronlehmann
Copy link
Collaborator

You know the implications of holding the store lock for too long, what do you think?

I guess we can change this to do things consistently. Holding the store lock just blocks all other transactions. Listing all services generally shouldn't take long - in extreme cases maybe tens or hundreds of ms? A leadership election blocks things for a lot longer. People probably won't remove secrets very often. But listing all services or tasks isn't really a good pattern, so we should only do this where absolutely necessary.

@diogomonica
Copy link
Contributor Author

@aluzzardi @aaronlehmann your call. Either merge it or tell me to change it.

@aluzzardi
Copy link
Member

in extreme cases maybe tens or hundreds of ms

Sounds good - let's favor consistency over performance then.

@diogomonica, would you mind changing this to perform the check in store.Update?

@mavenugo @mrjana Whenever you get a chance, could you do the same for networks? (as in, move the conflict check into the strongly consistent store.Update transaction).

@diogomonica diogomonica force-pushed the graceful-secret-removal branch from 78ecc87 to 4a45d67 Compare November 2, 2016 23:17
Signed-off-by: Diogo Monica <diogo.monica@gmail.com>
@diogomonica diogomonica force-pushed the graceful-secret-removal branch from 6259e60 to fe032ff Compare November 3, 2016 22:54
// - Returns an error if the deletion fails.
func (s *Server) RemoveSecret(ctx context.Context, request *api.RemoveSecretRequest) (*api.RemoveSecretResponse, error) {
if request.SecretID == "" {
return nil, grpc.Errorf(codes.InvalidArgument, "secret ID must be provided")
}

err := s.store.Update(func(tx store.Tx) error {
// Check inf the secret exists
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

@diogomonica diogomonica force-pushed the graceful-secret-removal branch from fe032ff to 6588784 Compare November 3, 2016 22:57
Signed-off-by: Diogo Monica <diogo.monica@gmail.com>
@aaronlehmann aaronlehmann force-pushed the graceful-secret-removal branch from 6588784 to f67b6d7 Compare November 3, 2016 23:07
@aaronlehmann
Copy link
Collaborator

LGTM

1 similar comment
@aluzzardi
Copy link
Member

LGTM

@aluzzardi aluzzardi merged commit 5afda15 into moby:master Nov 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants