server, client: fix hanging problem when etcd failed to start #1267

disksing · 2018-10-11T08:40:43Z

What problem does this PR solve?

Sometimes etcd may get stuck inside when it starts up. At this point, the port of the PD is in listened, but it cannot serve any requests. At the same time, the client (tidb) will also get stuck because the pd-server does not return a message.

What is changed and how it works?

Add context timeout in 2 places:

pd server startEtcd. If etcd is not able to start in time, pd-server will report an error and exit.
pd client initClusterID. If pd server not responds, the client will query next pd server instance.

TODO: update tidb vendor.

Check List

Tests

Integration test

Related changes

Need to cherry-pick to the release branch
Need to update the documentation

disksing · 2018-10-11T09:35:07Z

/run-all-tests

)

…#1287)

)

* server, client: fix hanging problem when etcd failed to start (#1267) * server: use same initialcluster config to restart joined member (#1279) * fix server build * pdctl: cherry pick bugfixes (#1298, #1299, #1308) * server/api: fix the issue about `regions/check` API (#1311) * fix join build * fix pdctl build * fix region test * fix warnings

disksing added 2 commits October 11, 2018 15:32

server: add timeout for starting etcd

2130d6c

client: add timeout for init cluster id

c196766

disksing requested review from nolouch and rleungx October 11, 2018 08:40

disksing added the needs-cherry-pick-release-2.1 The PR needs to cherry pick to release-2.1 branch. label Oct 11, 2018

Merge branch 'master' into start-etcd-timeout

48dda45

nolouch approved these changes Oct 11, 2018

View reviewed changes

rleungx approved these changes Oct 11, 2018

View reviewed changes

disksing merged commit a6b3a6c into tikv:master Oct 12, 2018

disksing deleted the start-etcd-timeout branch October 12, 2018 02:26

This was referenced Oct 15, 2018

*: udpate pd client vendor pingcap/tidb#7898

Merged

*: udpate pd client vendor pingcap/tidb#7905

Merged

disksing added a commit to oh-my-tidb/pd that referenced this pull request Oct 23, 2018

server, client: fix hanging problem when etcd failed to start (tikv#1267

eb73d03

)

disksing mentioned this pull request Oct 23, 2018

server, client: fix hanging problem when etcd failed to start (#1267) #1287

Merged

nolouch pushed a commit that referenced this pull request Oct 23, 2018

server, client: fix hanging problem when etcd failed to start (#1267) (…

6ee05f5

…#1287)

nolouch added the needs-cherry-pick-release-2.0 The PR needs to cherry pick to release-2.0 branch. label Nov 13, 2018

disksing added a commit to oh-my-tidb/pd that referenced this pull request Nov 14, 2018

server, client: fix hanging problem when etcd failed to start (tikv#1267

5f7712d

)

disksing mentioned this pull request Nov 14, 2018

Cherry pick bug fixes to release-2.0 #1324

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server, client: fix hanging problem when etcd failed to start #1267

server, client: fix hanging problem when etcd failed to start #1267

disksing commented Oct 11, 2018

disksing commented Oct 11, 2018

server, client: fix hanging problem when etcd failed to start #1267

server, client: fix hanging problem when etcd failed to start #1267

Conversation

disksing commented Oct 11, 2018

What problem does this PR solve?

What is changed and how it works?

Check List

disksing commented Oct 11, 2018