streaming: set default query timeout #10707

dnephin · 2021-07-27T21:56:47Z

This PR fixes two problems:

the HTTP API would log cancelled requests as an error
the health endpoint, with the streaming backend, did not set the default timeout that was used with the blocking query backend

Now that we have at least one endpoint that uses context for cancellation we can encounter this scenario where the returned error is a context.Cancelled or context.DeadlineExceeded. If the request.Context().Err() is not nil, then we know the request itself was cancelled, so we can log a different message at Info level, instad of the error.

agent/config/runtime.go

kisunji · 2021-07-28T16:02:12Z

agent/config/runtime.go

+
+// ApplyDefaultQueryOptions returns a function which will set default values on
+// the options based on the configuration.
+func ApplyDefaultQueryOptions(config *RuntimeConfig) func(options *structs.QueryOptions) {


Is there a need to keep this a pointer? Otherwise we may need a nil-check on config

Good question! I often try not to use pointers and instead use values when possible. I think it would be fine to use a value here, but I opted for a pointer because RuntimeConfig is a very large struct, and generally (because it is so large) we have been passing it around by pointer. I think we expect to only call this method on startup, so using a non-pointer might also be ok.

Regarding the nil-check, if this value were to come from a user then I think that would be very important. But the user can't directly set RuntimeConfig. It's expected the agent startup will always create one of these, and there should be no way that it could be nil.

So the only way for RuntimeConfig to be nil would be a "programming error". A panic is an appropriate way of handling programming errors because we expect our test suite to catch them. If we were to check for the nil, we'd either have to change the signature and return an error from this function, or we'd create a custom panic with more detail.

I suspect a detailed panic wouldn't really help here, since a nil pointer panic should hopefully be easy to debug. Changing the function signature to return an error would imply a runtime error, but there is no possible runtime error here (only programming errors). I generally try to avoid adding error returns unless there is a possible runtime error because it can force all the callers to also handle the error (which can be a distraction if the only cause is a possible programming error).

I think passing a non-pointer value could in some cases be worse. If we were to pass a zero value for RuntimeConfig it would force MaxQueryTime to be 0 always. Hopefully that kind of error would be noticed by tests, but a panic should be a lot more obvious.

In short, I think that's an excellent call out (checking for nil), but in this case I'm tempted to say the current implementation is reasonable given that we always expect a non-nil RuntimeConfig.

That said, it would be good to document these assumptions in the function docstring, so I will add a line about "RuntimeConfig must not be nil".

Thanks for the detailed answer! I definitely got more than I asked for :D
Your point about programming error vs runtime is really interesting - I've never thought of it that way and have seen my fair share of "this change touches a billion files because something down the stack now returns an error"

banks

Thanks Daniel. I think this looks like a good pattern for applying those defaults.

I had one comment inline which I think should probably be changed to avoid creating a new confusing log line if ever we have context timeout errors which are not related to clients at all. What do you think?

banks · 2021-07-28T14:58:52Z

agent/http.go

-				"from", req.RemoteAddr,
-				"error", err,
-			)
+			if req.Context().Err() != nil {


This will also catch context.DeadlineExceeded errors for which I don't think this message is appropriate. Should we limit it to only context.Canceled?

Hmm, actually not sure about this. I wasn't thinking about the fact that this is the req.Context directly rather than a test against any err that might be returned by the handler.

In that case I don't know of any way the request context would error with DeadlineExceeded so this is probably equivalent. I'm thinking this would potentially occur if there were some middle ware in the server that implemented a server-side timeout in which case it's not really the client cancelling the request.

But that doesn't exist so this is probably moot. Will leave it up to you if you thing the extra specificity is worthwhile.

Because of the context hierarchy, I think the only way for req.Context().Err() to be not nil is if either:

the request itself was cancelled by the user making the request

if http.Server.BaseContext is set, and something cancels that context

I don't think we use BaseContext yet, but even if we did I think this message would be appropriate. Generally the only reason for the BaseContext to be cancelled would be if the server was shutting down (because once cancelled every request would fail immediately), in which case logging the request as cancelled would also be accurate.

Thinking about it more, I guess because BaseContext is a function, that could act as the middleware, and it could set a deadline, so I think you're right it is possible.

I guess in most cases the err here would include the context about it being a timeout or a cancel, so maybe this log line will be ok in that case as well? I think it's a bit hard to say until we start using BaseContext.

banks

I realised I was not reading it right. This LGTM, see if you feel like the possible tweak to make this potentially more future proof but not a blocker.

The blocking query backend sets the default value on the server side. The streaming backend does not using blocking queries, so we must set the timeout on the client.

hc-github-team-consul-core · 2021-07-28T22:30:05Z

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/418180.

hc-github-team-consul-core · 2021-07-28T22:30:12Z

🍒✅ Cherry pick of commit d2b58cd onto release/1.10.x succeeded!

…ult-timeout streaming: set default query timeout

dnephin added the theme/streaming Related to Streaming connections between server and client label Jul 27, 2021

dnephin requested a review from a team July 27, 2021 21:56

github-actions bot added the theme/config Relating to Consul Agent configuration, including reloading label Jul 27, 2021

vercel bot temporarily deployed to Preview – consul-ui-staging July 27, 2021 21:58 Inactive

vercel bot temporarily deployed to Preview – consul July 27, 2021 21:58 Inactive

kisunji reviewed Jul 28, 2021

View reviewed changes

banks reviewed Jul 28, 2021

View reviewed changes

banks approved these changes Jul 28, 2021

View reviewed changes

dnephin added 2 commits July 28, 2021 17:50

streaming: set a default timeout

242b3a2

The blocking query backend sets the default value on the server side. The streaming backend does not using blocking queries, so we must set the timeout on the client.

add changelog

5edee9b

dnephin force-pushed the dnephin/streaming-setup-default-timeout branch from 16b49ca to 5edee9b Compare July 28, 2021 21:50

vercel bot temporarily deployed to Preview – consul-ui-staging July 28, 2021 21:50 Inactive

vercel bot temporarily deployed to Preview – consul July 28, 2021 21:50 Inactive

dnephin added the backport/1.10 label Jul 28, 2021

dnephin merged commit d2b58cd into main Jul 28, 2021

dnephin deleted the dnephin/streaming-setup-default-timeout branch July 28, 2021 22:29

hc-github-team-consul-core pushed a commit that referenced this pull request Jul 28, 2021

Merge pull request #10707 from hashicorp/dnephin/streaming-setup-defa…

789b6c1

…ult-timeout streaming: set default query timeout

eculver mentioned this pull request Sep 24, 2021

checks: populate interval and timeout when registering services #11138

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streaming: set default query timeout #10707

streaming: set default query timeout #10707

dnephin commented Jul 27, 2021

kisunji Jul 28, 2021

dnephin Jul 28, 2021 •

edited

Loading

kisunji Jul 28, 2021

banks left a comment

banks Jul 28, 2021

banks Jul 28, 2021 •

edited

Loading

dnephin Jul 28, 2021 •

edited

Loading

dnephin Jul 28, 2021

banks left a comment

hc-github-team-consul-core commented Jul 28, 2021

hc-github-team-consul-core commented Jul 28, 2021

streaming: set default query timeout #10707

streaming: set default query timeout #10707

Conversation

dnephin commented Jul 27, 2021

kisunji Jul 28, 2021

Choose a reason for hiding this comment

dnephin Jul 28, 2021 • edited Loading

Choose a reason for hiding this comment

kisunji Jul 28, 2021

Choose a reason for hiding this comment

banks left a comment

Choose a reason for hiding this comment

banks Jul 28, 2021

Choose a reason for hiding this comment

banks Jul 28, 2021 • edited Loading

Choose a reason for hiding this comment

dnephin Jul 28, 2021 • edited Loading

Choose a reason for hiding this comment

dnephin Jul 28, 2021

Choose a reason for hiding this comment

banks left a comment

Choose a reason for hiding this comment

hc-github-team-consul-core commented Jul 28, 2021

hc-github-team-consul-core commented Jul 28, 2021

dnephin Jul 28, 2021 •

edited

Loading

banks Jul 28, 2021 •

edited

Loading

dnephin Jul 28, 2021 •

edited

Loading