Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add logging and a max wait time to vtgate shutdown drain #3560

Merged
merged 3 commits into from
Jan 25, 2018

Conversation

demmer
Copy link
Member

@demmer demmer commented Jan 17, 2018

Wait no more than 30 seconds for active vtgate mysql connections to
drain, and emit a log every 2 seconds while waiting with the count
of connections that we're waiting for.

Wait no more than 30 seconds for active vtgate mysql connections to
drain, and emit a log every 2 seconds while waiting with the count
of connections that we're waiting for.
@demmer
Copy link
Member Author

demmer commented Jan 17, 2018

@tpetr / @sougou this is some of what I mentioned on #3518.

I'd still feel better if we had a test for this behavior, but at least now there's some insight into what's happening during the shutdown.

log.Infof("Waiting for all client connections to be idle (%d active)...", atomic.LoadInt32(&busyConnections))
start := time.Now()
reported := start
for time.Since(start) < 30*time.Second {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a global timeout for OnTermSync() hooks controlled by -onterm_timeout (defaults to 10 seconds). Can we depend on that instead?

@tpetr
Copy link
Contributor

tpetr commented Jan 18, 2018

Thanks for the PR and all the comments @demmer! Apologies for not getting back to you sooner -- work has been super busy and I haven't had enough time to keep up with you and sougou. This PR LGTM once we square away my question about the onTermSyncHooks timeout.

Re: #3518 -- To be completely honest, I didn't feel ready for it to be merged in the state that it was in. In addition to wanting to write tests, I'm not convinced that the functionality we ended up with will be reliable enough for HubSpot. We have many mysql-heavy apps communicating with Vitess through a small number of vtgates, and while it's likely that each individual connection will go idle within a reasonable length of time, it's not likely that we'll observe all connections to be idle at the same exact time. This will cause us to exhaust the OnTermSync timeout and then all connections will be closed regardless of whether or not they were idle, which is unacceptable. This is why I liked the original approach better: proactively closing connections that we know are OK to be closed so that we minimize the ones that are closed in the middle of a query / transaction when we run out of time. Does that make sense? Can you tell me a bit about the environment you're running Vitess in where functionality as it currently exists in master is preferred? I'm happy to open another PR if we're cool with revisiting my original approach.

@demmer
Copy link
Member Author

demmer commented Jan 23, 2018

In Slack's environment we tend to have very short-lived connections from the app to the vtgate, so once the listener is closed and no new connections are accepted, the existing ones should generally complete in a short period of time, which is why waiting for an idle period even when connections are open seemed fine to me. Furthermore, before a clean vtgate shutdown/restart we first remove it from our service discovery and wait for the app servers to stop sending connections to it, so that all in all the vtgate should be totally idle before we try to shut it down.

However, to address your concerns, one approach could be to set a shuttingDown barrier that you check in the query handler to reject any new transactions from being started, and/or to prevent any queries from coming in that aren't in a transaction.

That seems to me to be a lower overhead approach of accomplishing the same goal without the complexity / risk of tracking every active connection in a map.

@sougou
Copy link
Contributor

sougou commented Jan 25, 2018

LGTM

Approved with PullApprove

@sougou sougou merged commit 4a3f148 into vitessio:master Jan 25, 2018
systay pushed a commit that referenced this pull request Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants