-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
watch stream return "permission denied" if token expired #12385
Comments
Yes,you are right. and you can get more information about the token being deleted with command
So, I think that is correct. and you? |
acutally, i enabled debug log during my investigation :). i saw the log at very beginning, but it takes me a while to understand why the stale token is still being referenced. |
and for anyone has the same problem: |
Old etcdserver which have not apply pr of etcd-io#12165 will check auth token even if the request is a Authenticate request. If the client has a invalid auth token, it will not able to update it's token, since the Authenticate has a invalid auth token. This fix clear the auth token when encounter an ErrInvalidAuthToken to talk with old version etcd servers. Fix etcd-io#12385 with etcd-io#12165 and etcd-io#12264
Old etcdserver which have not apply pr of etcd-io#12165 will check auth token even if the request is a Authenticate request. If the client has a invalid auth token, it will not able to update it's token, since the Authenticate has a invalid auth token. This fix clear the auth token when encounter an ErrInvalidAuthToken to talk with old version etcd servers. Fix etcd-io#12385 with etcd-io#12165 and etcd-io#12264
Old etcdserver which have not apply pr of etcd-io#12165 will check auth token even if the request is an Authenticate request. If the client has a invalid auth token, it will not able to update it's token, since the Authenticate has a invalid auth token. This fix clear the auth token when encounter an ErrInvalidAuthToken to talk with old version etcd servers. Fix etcd-io#12385 with etcd-io#12165 and etcd-io#12264
Old etcdserver which have not apply pr of etcd-io#12165 will check auth token even if the request is an Authenticate request. If the client has a invalid auth token, it will not able to update it's token, since the Authenticate has a invalid auth token. This fix clear the auth token when encounter an ErrInvalidAuthToken to talk with old version etcd servers. Fix etcd-io#12385 with etcd-io#12165 and etcd-io#12264
Old etcdserver which have not apply pr of etcd-io#12165 will check auth token even if the request is an Authenticate request. If the client has a invalid auth token, it will not able to update it's token, since the Authenticate has a invalid auth token. This fix clear the auth token when encounter an ErrInvalidAuthToken to talk with old version etcd servers. Fix etcd-io#12385 with etcd-io#12165 and etcd-io#12264
This attempts to fix a special case of the problem described in etcd-io#12385, where trying to do `clientv3.Watch` with an expired token would result in `ErrGRPCPermissionDenied`, due to the failing authorization check in `isWatchPermitted`. Furthermore, the client can't auto recover, since `shouldRefreshToken` rightly returns false for the permission denied error. In this case, we would like to have a runbook to dynamically disable auth, without causing any disruption. Doing so would immediately expire all existing tokens, which would then cause the behavior described above. This means existing watchers would still work for a period of time after disabling auth, until they have to reconnect, e.g. due to a rolling restart of server nodes. This commit adds a client-side fix and a server-side fix, either of which is sufficient to get the added test case to pass. Note that it is an e2e test case instead of an integration one, as the reconnect only happens if the server node is stopped via SIGINT or SIGTERM. A generic fix for the problem described in etcd-io#12385 would be better, as that shall also fix this special case. However, the fix would likely be a lot more involved, as some untangling of authn/authz is required.
This attempts to fix a special case of the problem described in etcd-io#12385, where trying to do `clientv3.Watch` with an expired token would result in `ErrGRPCPermissionDenied`, due to the failing authorization check in `isWatchPermitted`. Furthermore, the client can't auto recover, since `shouldRefreshToken` rightly returns false for the permission denied error. In this case, we would like to have a runbook to dynamically disable auth, without causing any disruption. Doing so would immediately expire all existing tokens, which would then cause the behavior described above. This means existing watchers would still work for a period of time after disabling auth, until they have to reconnect, e.g. due to a rolling restart of server nodes. This commit adds a client-side fix and a server-side fix, either of which is sufficient to get the added test case to pass. Note that it is an e2e test case instead of an integration one, as the reconnect only happens if the server node is stopped via SIGINT or SIGTERM. A generic fix for the problem described in etcd-io#12385 would be better, as that shall also fix this special case. However, the fix would likely be a lot more involved, as some untangling of authn/authz is required.
This ticket is closed, but I can still reproduce this exactly as described by OP (except import path is I just want to watch a key indefinitely, but this seems currently not to work in conjuction with authentication tokens. |
This bug should still be reproducible, due to the problem I described in #13577:
The commit from that PR only fixed a special case of the problem, where all the tokens expired after disabling authentication. It doesn't fix the problem where a token expired naturally. Last I checked, the root cause is that |
Hmm I see... let me reproduce the issue on my side. |
Sorry for delayed update, I'll be able to check the issue sometime this week. |
I could reproduce the bug on the latest main branch... Let me reopen the issue and handle it. |
This is the branch for the fix: https://github.com/mitake/etcd/tree/watch-auth-err I'll open a PR after finalizing it. |
The context on server side never change after stream created, so the token won't refresh, subsequent watches will fail after the token expires. PerRPCCredentials only called once when the stream is created:
I think the solutions for the future:
Temporary solutions:
|
Thanks for commenting @kafuu-chino , actually I think |
@jwebb @sayap @kafuu-chino I opened this PR: #14322 I checked the above test case can pass now. Could you cross check if you have time? |
@kafuu-chino I'll also check your PR #14296 later, thanks for opening this! |
In order to fix etcd-io#12385, PR etcd-io#14322 introduced a change in which the client side may retry based on the error message returned from server side. This is not good, as it's too fragile and it's also changed the protocol between client and server. Please see the discussion in kubernetes/kubernetes#114403 Note: The issue etcd-io#12385 only happens when auth is enabled, and client side reuse the same client to watch. So we decided to rollback the change on 3.5, reasons: 1.K8s doesn't enable auth at all. It has no any impact on K8s. 2.It's very easy for client application to workaround the issue. The client just needs to create a new client each time before watching. Signed-off-by: Benjamin Wang <wachao@vmware.com>
After I update etcd to v3.5.7, the err becomes to this:
It could be 100% reproduced after the client has not been used for a few minutes. |
In order to fix etcd-io#12385, PR etcd-io#14322 introduced a change in which the client side may retry based on the error message returned from server side. This is not good, as it's too fragile and it's also changed the protocol between client and server. Please see the discussion in kubernetes/kubernetes#114403 Note: The issue etcd-io#12385 only happens when auth is enabled, and client side reuse the same client to watch. So we decided to rollback the change on 3.5, reasons: 1.K8s doesn't enable auth at all. It has no any impact on K8s. 2.It's very easy for client application to workaround the issue. The client just needs to create a new client each time before watching. Signed-off-by: Benjamin Wang <wachao@vmware.com>
In order to fix etcd-io#12385, PR etcd-io#14322 introduced a change in which the client side may retry based on the error message returned from server side. This is not good, as it's too fragile and it's also changed the protocol between client and server. Please see the discussion in kubernetes/kubernetes#114403 Note: The issue etcd-io#12385 only happens when auth is enabled, and client side reuse the same client to watch. So we decided to rollback the change on 3.5, reasons: 1.K8s doesn't enable auth at all. It has no any impact on K8s. 2.It's very easy for client application to workaround the issue. The client just needs to create a new client each time before watching. Signed-off-by: Benjamin Wang <wachao@vmware.com>
Refer to #17384 (comment) |
i ran into the similar problem as #8914
i'm using v3.4.13 server and v3.4.12 client.
i can reproduce the error using following steps:
on one terminal , start etcd server and enable auth and shroten the default token ttl:
on the other termianal, run a the demo program:
as shown in the log , the 1st and 2nd watch is ok, while the 3rd and 4th report "permission denied"
meanwile, etcd server will log following:
i think the causes are:
The text was updated successfully, but these errors were encountered: