-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support/db: Delay canceling queries from client side when there's a statement / transaction timeout configured in postgres #5223
Conversation
7ff5919
to
3014772
Compare
40bf0c3
to
c20c9bf
Compare
serverSidePGTimeoutConfigs := []db.ClientConfig{ | ||
db.StatementTimeout(app.config.ConnectionTimeout), | ||
db.IdleTransactionTimeout(app.config.ConnectionTimeout), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ingestion system uses a separate connection pool
go/services/horizon/internal/init.go
Lines 87 to 89 in 6036a07
HistorySession: mustNewDBSession( | |
db.IngestSubservice, app.config.DatabaseURL, ingest.MaxDBConnections, ingest.MaxDBConnections, app.prometheusRegistry, | |
), |
so we can apply these timeout settings to app.historyQ
and app.primaryHistoryQ
because we know that those db connections will never be used in ingestion and will only be used when serving HTTP requests
services/horizon/internal/flags.go
Outdated
Usage: "defines the timeout for when horizon will cancel all postgres queries connected to an HTTP request. The timeout is measured in seconds since the start of the HTTP request. Note, this timeout does not apply to POST /transactions. " + | ||
"The difference between cancel-db-query-timeout and connection-timeout is that connection-timeout applies a postgres statement timeout whereas cancel-db-query-timeout will send an additional request to postgres to cancel the ongoing query. " + | ||
"Generally, cancel-db-query-timeout should be configured to be higher than connection-timeout to allow the postgres statement timeout to kill long running queries without having to send the additional cancel request to postgres. " + | ||
"By default, cancel-db-query-timeout will be set to twice the connection-timeout.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"By default, cancel-db-query-timeout will be set to twice the connection-timeout.", | |
"When set to 0, which is the default, it will trigger cancel-db-query-timeout to be set to twice the connection-timeout.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor, should the flag allow for disabling by the user, I see later in middleware, 0
equates to skipping application of the flag, but it would never be exercised as of now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to allow the flag to be disabled because I think it could result in confusion. If we skip the application of this flag, the connection timeout will be set to both the postgres server side statement timeouts and the query cancelations on the horizon side. I think this would be unintuitive because, as a user, my first thought is that disabling the query cancelation timeout would result in no query cancelation whatsoever.
If we wanted to allow the option of having no query cancelation at all we could equate 0 to that behavior. -1 could be a default which means cancel-db-query-timeout will be set to twice the connection-timeout value. The user still has the option of setting cancel-db-query-timeout and connection-timeout to the same value which will result in more or less the status quo behavior of horizon prior to this PR.
services/horizon/internal/flags.go
Outdated
@@ -437,6 +437,18 @@ func Flags() (*Config, support.ConfigOptions) { | |||
Usage: "defines the timeout of connection after which 504 response will be sent or stream will be closed, if Horizon is behind a load balancer with idle connection timeout, this should be set to a few seconds less that idle timeout, does not apply to POST /transactions", | |||
UsedInCommands: ApiServerCommands, | |||
}, | |||
&support.ConfigOption{ | |||
Name: "cancel-db-query-timeout", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just driving by - can we simplify the name to query-timeout
? I think it delivers the same meaning with less verbosity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Shaptic how about cancel-query-timeout
? I would like to emphasize that this timeout is only for client side query canceling and not like the postgres server side statement timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, good point! maybe client-query-timeout
then? I guess in my mind I read the flag as "cancel query" and think "what the heck is a cancel query, and why does it need a timeout?"
(also to be clear I don't want to delay this PR at all)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice integration, left some optional comments for consideration.
return requestCtx, nil, requestCtx.Err() | ||
} | ||
|
||
if deadline.IsZero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the caller context is dropped here, not propagated further, if the http client as caller were to close its connection early, middleware responds with ClientDisconnected http response anyways, but before making that response, will middleware still block on the call stack to finish first such as when these Session methods now continue to run longer to finish or hit new deadline? Will this possibly change log output behavior? Just trying to confirm for my own understanding, not suggesting change. I was looking for an integration test that exercises this case of an http request that leads into db and initiates an early client cancel but not seeing one so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the http client disconnects, the context attached to the http request in horizon would be canceled. If that happens before the query is made to the DB, horizon will not attempt to do the query and will just return with an error that eventually gets translated to ClientDisconnected in the response. If the request context is canceled during the query, horizon will still wait until the query finishes. If that was the last query in the horizon endpoint, horizon will respond with a 200 and populate the response body with the json payload. If there was another query required, the session would detect the request context was cancelled and would return an error that eventually gets translated to ClientDisconnected in the response.
Even without this change, it is still possible to get a 200 response when the http client disconnects. That case will occur if the client disconnects after the last db query finishes but before horizon written any http header / response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks for incorporating feedback.
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
Add a
--cancel-db-query-timeout
flag which configures a timeout on all db queries invoked by a horizon HTTP request. This timeout is a postgres client side timeout which means that it is triggered by horizon establishing a new connection to postgres in order to cancel the ongoing query.Why
Previously, we configured both postgres client side timeouts and server side timeouts (in the form of the statement_timeout postgres setting). However, the client side timeout is redundant when the postgres server will enforce its own statement timeout. In fact, the client side timeout can result in additional load which will exacerbate situations when the postgres DB cannot keep up with the ongoing query traffic.
In our horizon deployments, we plan to configure the client side db timeout to be longer than the server side db timeout. We expect the server side timeout to terminate a majority of the long running queries and the client side timeout will be a fallback in the unexpected scenario where the db query has surpassed the server side timeout.
Known limitations
[N/A]