-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
streaming: set default query timeout #10707
Conversation
Now that we have at least one endpoint that uses context for cancellation we can encounter this scenario where the returned error is a context.Cancelled or context.DeadlineExceeded. If the request.Context().Err() is not nil, then we know the request itself was cancelled, so we can log a different message at Info level, instad of the error.
|
||
// ApplyDefaultQueryOptions returns a function which will set default values on | ||
// the options based on the configuration. | ||
func ApplyDefaultQueryOptions(config *RuntimeConfig) func(options *structs.QueryOptions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to keep this a pointer? Otherwise we may need a nil-check on config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! I often try not to use pointers and instead use values when possible. I think it would be fine to use a value here, but I opted for a pointer because RuntimeConfig
is a very large struct, and generally (because it is so large) we have been passing it around by pointer. I think we expect to only call this method on startup, so using a non-pointer might also be ok.
Regarding the nil-check, if this value were to come from a user then I think that would be very important. But the user can't directly set RuntimeConfig
. It's expected the agent startup will always create one of these, and there should be no way that it could be nil.
So the only way for RuntimeConfig
to be nil
would be a "programming error". A panic is an appropriate way of handling programming errors because we expect our test suite to catch them. If we were to check for the nil, we'd either have to change the signature and return an error from this function, or we'd create a custom panic with more detail.
I suspect a detailed panic wouldn't really help here, since a nil pointer panic should hopefully be easy to debug. Changing the function signature to return an error would imply a runtime error, but there is no possible runtime error here (only programming errors). I generally try to avoid adding error returns unless there is a possible runtime error because it can force all the callers to also handle the error (which can be a distraction if the only cause is a possible programming error).
I think passing a non-pointer value could in some cases be worse. If we were to pass a zero value for RuntimeConfig
it would force MaxQueryTime
to be 0 always. Hopefully that kind of error would be noticed by tests, but a panic should be a lot more obvious.
In short, I think that's an excellent call out (checking for nil), but in this case I'm tempted to say the current implementation is reasonable given that we always expect a non-nil RuntimeConfig
.
That said, it would be good to document these assumptions in the function docstring, so I will add a line about "RuntimeConfig must not be nil".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed answer! I definitely got more than I asked for :D
Your point about programming error vs runtime is really interesting - I've never thought of it that way and have seen my fair share of "this change touches a billion files because something down the stack now returns an error"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Daniel. I think this looks like a good pattern for applying those defaults.
I had one comment inline which I think should probably be changed to avoid creating a new confusing log line if ever we have context timeout errors which are not related to clients at all. What do you think?
"from", req.RemoteAddr, | ||
"error", err, | ||
) | ||
if req.Context().Err() != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also catch context.DeadlineExceeded
errors for which I don't think this message is appropriate. Should we limit it to only context.Canceled
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, actually not sure about this. I wasn't thinking about the fact that this is the req.Context
directly rather than a test against any err
that might be returned by the handler.
In that case I don't know of any way the request context would error with DeadlineExceeded
so this is probably equivalent. I'm thinking this would potentially occur if there were some middle ware in the server that implemented a server-side timeout in which case it's not really the client cancelling the request.
But that doesn't exist so this is probably moot. Will leave it up to you if you thing the extra specificity is worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the context hierarchy, I think the only way for req.Context().Err()
to be not nil is if either:
- the request itself was cancelled by the user making the request
- if http.Server.BaseContext is set, and something cancels that context
I don't think we use BaseContext
yet, but even if we did I think this message would be appropriate. Generally the only reason for the BaseContext to be cancelled would be if the server was shutting down (because once cancelled every request would fail immediately), in which case logging the request as cancelled would also be accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it more, I guess because BaseContext
is a function, that could act as the middleware, and it could set a deadline, so I think you're right it is possible.
I guess in most cases the err
here would include the context about it being a timeout or a cancel, so maybe this log line will be ok in that case as well? I think it's a bit hard to say until we start using BaseContext
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realised I was not reading it right. This LGTM, see if you feel like the possible tweak to make this potentially more future proof but not a blocker.
The blocking query backend sets the default value on the server side. The streaming backend does not using blocking queries, so we must set the timeout on the client.
16b49ca
to
5edee9b
Compare
🍒 If backport labels were added before merging, cherry-picking will start automatically. To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/418180. |
🍒✅ Cherry pick of commit d2b58cd onto |
…ult-timeout streaming: set default query timeout
This PR fixes two problems: