-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on WatchChan behavior. #8188
Comments
if you do not cancel the watch, yes. |
@xiang90 Could you explain more i.e. when can a watch get canceled (lets assume, user will never cancel) what happens if the leader dies, or if the client loses connection to the etcd cluster would that cause the watch to get canceled ? in other words, what can cause the watch to get canceled other than user invoked cancel ? I am asking this because, I am seeing this issue very frequently at scale of ~2-4M constant K/V updates. |
what issue? |
where watch closes without any error without cancel being invoke and etcd cluster is fine. |
@SuhasAnand Can you somehow reproduce it? If so, please provide a script to us and we can help you on it. |
Sure. @xiang90 Meanwhile could you please help answer the following ?
|
@xiang90 This is what I have observed
Based on the above in godoc I inferred , just before WatchChan is closed Canceled is always set, hence I would only need to check for Canceled, and if it is set, only then there will be a non nil error, but in reality its not so, watchchan can be closed without canceled being set, so IMO we should change the godoc to reflect this, because, in my case watchchan did fail but Canceled was not set. (ISSUE 1)
on further analysis, I saw at a scaled run where where are several (~2M) KV pairs associated with the same lease then upon lease expiry grpc-proxy (there are several issues using grpc proxy more on that later) / etcd will send around ~2M DEL watchresponse notifications, which will close the channel (with canceled not being set) and the Err being set to : |
this is a bug. gRPC probably has a upper bound of message size. etcd does not try to respect that. |
This also blocks #7624 if implemented using a keyspace-wide watch. The watch messages will need a fragmentation flag (probably want semantics like |
I'm interested in working on #7624, if no one has started working on this blocking issue with your suggested solution, I can implement it. Would the implementation involve:
Also, a limit field that is settable by users might not be that useful, since there isn't going to be a case where the user wants to walk away without only the first N events. All the events must be sent over, so the limit seems more like an implementation details rather than an option. The only scenario where I can imagine a user settable limit to be useful is if, for some reason, a user only wants N messages to be sent at a time. |
@mangoslicer #7624 is a separate issue from this; it shouldn't need any modifications to the RPC messages. Thinking about this issue more, The fix here would involve a |
Appears to have just recently (May 22) merged a change to grpc which caused things to start failing for me. I can no longer do The old value was just "slightly" larger: https://github.com/grpc/grpc-go/pull/1165/files#diff-e1550a73f5d25064c8b586ec68d81a64L105 |
@eparis please open another issue for this? The fix for |
Did you already start working on the fix? Or is this issue still up for grabs? |
@mangoslicer haven't started; have at it |
@heyitsanthony I put up a PR, can you take a look? |
@xiang90 Do I understand correctly that WatchChan is closed when corresponding watchGrpcStream closed (e.g. due to ErrNoLeader) and to resume watching on a key I have to Close() clientv3.Client, create another one with client := clientv3.New() and call client.Watch()? Calling client.Watch() with the same parameters (namely, context) without re-creating client will result in picking closed watchGrpcStream from watcher's streams map since map key will be the same (ctxKey := fmt.Sprintf("%v", ctx)). Source: this part of clientv3 code: So, am I right? |
No, code won't pick closed watchGrpcStream because (*watcher)closeStream will delete it from streams map. Thank you. |
This is fixed in 3.3. The default limit was 4MiB in gRPC side. Now unlimited by default (configurable). Also we clarified watch behavior in godoc https://godoc.org/github.com/coreos/etcd/clientv3#Watcher. Moving proxy watch fragment discussion to a separate issue. |
From the ClientV3 godoc its not clear if the watch response will have non nil error for every scenario.
for example consider the following code
In godoc I see
so is it fair to expect in above code I should not see the line
fmt.Println("I should never be printed")
being executed ?The text was updated successfully, but these errors were encountered: