Skip to content

server: return results of ongoing queries when graceful shutdown #19669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Dec 3, 2020

Conversation

SunRunAway
Copy link
Contributor

@SunRunAway SunRunAway commented Sep 1, 2020

What problem does this PR solve?

Issue Number: close #19663

Problem Summary:

Got Query execution was interrupted when graceful shutdown

What is changed and how it works?

Proposal: xxx

What's Changed:
Should not return ErrQueryInterrupted when server state is connStatusWaitShutdown. Results of ongoing queries should be returned successfully when graceful shutdown.

How it Works:

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Manual test (add detailed scripts or steps below)

Add an e2e test:

master:

$ make
cd /Users/sunrunaway/code/gopath/src/github.com/pingcap/tidb && \
	CGO_ENABLED=1 GO111MODULE=on go build  -tags codes -ldflags '-X "github.com/pingcap/parser/mysql.TiDBReleaseVersion=v4.0.0-beta.2-1669-gb3f8be7d3" -X "github.com/pingcap/tidb/util/versioninfo.TiDBBuildTS=2020-11-30 09:26:36" -X "github.com/pingcap/tidb/util/versioninfo.TiDBGitHash=b3f8be7d3c9a719d043f41db14b62d259a79ecb2" -X "github.com/pingcap/tidb/util/versioninfo.TiDBGitBranch=master" -X "github.com/pingcap/tidb/util/versioninfo.TiDBEdition=Community"' -o /Users/sunrunaway/code/gopath/src/github.com/pingcap/tidb/tests/graceshutdown/bin/tidb-server tidb-server/main.go
Build TiDB Server successfully!
$ ./run-tests.sh
INFO[0000] starting tidb: bin/tidb-server --store=mocktikv --path=/tmp/tidb_gracefulshutdown/mocktikv -P=5501 --status=8500 --log-file=/tmp/tidb_gracefulshutdown/tidb5501.log 
INFO[0000] connect to server 127.0.0.1:5501 ok          
INFO[0001] service "tidb" Interrupt                     

----------------------------------------------------------------------
FAIL: graceshutdown_test.go:113: TestGracefulShutdownSuite.TestGracefulShutdown

graceshutdown_test.go:144:
    c.Assert(err, IsNil)
... value *mysql.MySQLError = &mysql.MySQLError{Number:0x525, Message:"Query execution was interrupted"} ("Error 1317: Query execution was interrupted")

OOPS: 0 passed, 1 FAILED
--- FAIL: TestGracefulShutdown (3.54s)
FAIL
exit status 1
FAIL	graceshutdown	3.557s

This branch:

$ make
cd /Users/sunrunaway/code/gopath/src/github.com/pingcap/tidb && \
	CGO_ENABLED=1 GO111MODULE=on go build  -tags codes -ldflags '-X "github.com/pingcap/parser/mysql.TiDBReleaseVersion=v4.0.0-beta.2-1671-g77e9fc4ad" -X "github.com/pingcap/tidb/util/versioninfo.TiDBBuildTS=2020-11-30 09:21:36" -X "github.com/pingcap/tidb/util/versioninfo.TiDBGitHash=77e9fc4ada3d6266f7d8e8702574f4c8d261e794" -X "github.com/pingcap/tidb/util/versioninfo.TiDBGitBranch=issue19663" -X "github.com/pingcap/tidb/util/versioninfo.TiDBEdition=Community"' -o /Users/sunrunaway/code/gopath/src/github.com/pingcap/tidb/tests/graceshutdown/bin/tidb-server tidb-server/main.go
Build TiDB Server successfully!
$ ./run-tests.sh
INFO[0000] starting tidb: bin/tidb-server --store=mocktikv --path=/tmp/tidb_gracefulshutdown/mocktikv -P=5501 --status=8500 --log-file=/tmp/tidb_gracefulshutdown/tidb5501.log 
INFO[0000] connect to server 127.0.0.1:5501 ok          
INFO[0001] service "tidb" Interrupt                                          
PASS: graceshutdown_test.go:113: TestGracefulShutdownSuite.TestGracefulShutdown	3.538s
OK: 1 passed
PASS
ok  	graceshutdown	3.549s

Release note

  • return results of ongoing queries when graceful shutdown.

@SunRunAway
Copy link
Contributor Author

Should cherry-pick to release-4.0 and release-3.0 manually.

Copy link
Contributor

@tiancaiamao tiancaiamao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results of ongoing queries should be returned successfully when graceful shutdown.

This may affect the correctness! we can either return a full of data with success, or an error.
When the query is interrupted, we can not be sure the integrity of the data.

@@ -486,9 +486,6 @@ func (s *Server) ShowProcessList() map[uint64]*util.ProcessInfo {
defer s.rwlock.RUnlock()
rs := make(map[uint64]*util.ProcessInfo, len(s.clients))
for _, client := range s.clients {
if atomic.LoadInt32(&client.status) == connStatusWaitShutdown {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somebody once complained that he use kill tidb connection XXX, but could still see the session in show processlist.
Here is a fix for him.

Copy link
Contributor Author

@SunRunAway SunRunAway Sep 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should stay in processlist because it is really running and not fully closed. Maybe we can show a closing state for him.

@SunRunAway
Copy link
Contributor Author

we can not be sure the integrity of the data.

The client will get a connection reset by peer error if the data is not fully received.

@tiancaiamao
Copy link
Contributor

tiancaiamao commented Sep 7, 2020

What I mean has nothing to do with 'connection reset by peer'.
We should return 100 lines, but we returns 50 lines, and we tell the client OK, that's not acceptable.

We should return 100 lines, but we can't guarantee the integrity of these data, so we return Interrupted error, that's more reasonable.

@tiancaiamao
Copy link
Contributor

Graceful shutdown does not set the Killed flag, so the data are fully drained.
(Even the Killed flag is set, the Next() function will return an error).
We can ensure the integrity of the data, so it does not involve correctness issue.

From the client's point of view, when the server enter graceful shutdown mode, it's better for the ongoing connection to reply a success before it's closed, rather than replying a interrupted error and close.
So after the change, the behavior is more reasonable.

@tiancaiamao
Copy link
Contributor

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 9, 2020
@SunRunAway SunRunAway requested a review from zz-jason September 10, 2020 09:41
@ti-srebot
Copy link
Contributor

@tiancaiamao, @lysu, @zz-jason, @qw4990, PTAL.

2 similar comments
@ti-srebot
Copy link
Contributor

@tiancaiamao, @lysu, @zz-jason, @qw4990, PTAL.

@ti-srebot
Copy link
Contributor

@tiancaiamao, @lysu, @zz-jason, @qw4990, PTAL.

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about adding some tests?

@ti-srebot
Copy link
Contributor

@tiancaiamao, @zz-jason, @lysu, @qw4990, PTAL.

1 similar comment
@ti-srebot
Copy link
Contributor

@tiancaiamao, @zz-jason, @lysu, @qw4990, PTAL.

@ti-srebot
Copy link
Contributor

@tiancaiamao, @zz-jason, @qw4990, @lysu, PTAL.

@ghost
Copy link

ghost commented Sep 24, 2020

There is no reward for this challenge pull request, so you can request a reward from @qw4990.

More

Tip : About reward you can refs to reward-command.

Warning: None

@ti-srebot
Copy link
Contributor

@tiancaiamao, @zz-jason, @qw4990, @lysu, PTAL.

@lysu lysu removed their request for review September 28, 2020 02:55
@ti-srebot
Copy link
Contributor

@tiancaiamao, @zz-jason, @qw4990, PTAL.

Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM, Please add some tests.

@ti-srebot
Copy link
Contributor

@tiancaiamao, @zz-jason, @qw4990, PTAL.

2 similar comments
@ti-srebot
Copy link
Contributor

@tiancaiamao, @zz-jason, @qw4990, PTAL.

@ti-srebot
Copy link
Contributor

@tiancaiamao, @zz-jason, @qw4990, PTAL.

Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Nov 30, 2020
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Nov 30, 2020
@qw4990
Copy link
Contributor

qw4990 commented Nov 30, 2020

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Nov 30, 2020
@ti-srebot
Copy link
Contributor

/run-all-tests

@SunRunAway
Copy link
Contributor Author

can we move the contents in tests/graceshutdown/.gitignore to .gitignore?

I prefer keeping it in tests/graceshutdown/ so this directory is a complete and independent component.

@ti-srebot
Copy link
Contributor

@SunRunAway merge failed.

@ti-srebot
Copy link
Contributor

@tiancaiamao, @zz-jason, @qw4990, @ti-srebot, PTAL.

@SunRunAway
Copy link
Contributor Author

/merge

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@SunRunAway merge failed.

@SunRunAway
Copy link
Contributor Author

/merge

@ti-srebot
Copy link
Contributor

Your auto merge job has been accepted, waiting for:

  • 21185

@ti-srebot
Copy link
Contributor

/run-all-tests

@SunRunAway SunRunAway merged commit d67a102 into pingcap:master Dec 3, 2020
@SunRunAway SunRunAway deleted the issue19663 branch December 3, 2020 06:29
ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Dec 3, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #21464

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/server status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Got Query execution was interrupted when graceful shutdown
5 participants