Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use v2 api to update cluster version #12988

Merged
merged 2 commits into from
May 17, 2021

Conversation

chaochn47
Copy link
Member

Fix #12987

{
  "level":"info",
  "ts":"2021-05-17T18:28:27.564Z",
  "caller":"membership/cluster.go:522",
  "msg":"updated cluster version",
  "cluster-id":"ef37ad9dc622a7c4",
  "local-member-id":"6f7e5687e0b95471",
  "from":"3.4",
  "to":"3.5"
}

Testing cluster version upgrade from 3.4 to 3.5

rm -rf /home/chaochn/log/etcd/
mkdir -p /home/chaochn/log/etcd

# create 3.4 cluster
~/etcd/release-3.4/etcd --name infra1 \
--listen-client-urls http://127.0.0.1:2379 \
--advertise-client-urls http://127.0.0.1:2379 \
--listen-peer-urls http://127.0.0.1:12380 \
--initial-advertise-peer-urls http://127.0.0.1:12380 \
--initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new \
--enable-pprof > /home/chaochn/log/etcd/infra1.log 2>&1 &

~/etcd/release-3.4/etcd --name infra2 \
--listen-client-urls http://127.0.0.1:22379 \
--advertise-client-urls http://127.0.0.1:22379 \
--listen-peer-urls http://127.0.0.1:22380 \
--initial-advertise-peer-urls http://127.0.0.1:22380 \
--initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new \
--enable-pprof > /home/chaochn/log/etcd/infra2.log 2>&1 &

~/etcd/release-3.4/etcd --name infra3 \
--listen-client-urls http://127.0.0.1:32379 \
--advertise-client-urls http://127.0.0.1:32379 \
--listen-peer-urls http://127.0.0.1:32380 \
--initial-advertise-peer-urls http://127.0.0.1:32380 \
--initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new \
--enable-pprof > /home/chaochn/log/etcd/infra3.log 2>&1 &

# upgrade to 3.5
## first
ETCDCTL_API=3 ~/etcd/release-3.4/etcdctl --endpoints http://127.0.0.1:2379,http://127.0.0.1:22379 member remove fd422379fda50e48
ETCDCTL_API=3 ~/etcd/release-3.4/etcdctl --endpoints http://127.0.0.1:2379,http://127.0.0.1:22379 member add infra4 --peer-urls=http://127.0.0.1:32381

~/etcd/3.5.0-alpha.0/etcd --name infra4 \
--listen-client-urls http://127.0.0.1:32379 \
--advertise-client-urls http://127.0.0.1:32379 \
--listen-peer-urls http://127.0.0.1:32381 \
--initial-advertise-peer-urls http://127.0.0.1:32381 \
--initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra4=http://127.0.0.1:32381' --initial-cluster-state existing \
--enable-pprof > /home/chaochn/log/etcd/infra4.log 2>&1 &

## second
ETCDCTL_API=3 ~/etcd/release-3.4/etcdctl --endpoints http://127.0.0.1:2379,http://127.0.0.1:32379 member remove 91bc3c398fb3c146
ETCDCTL_API=3 ~/etcd/release-3.4/etcdctl --endpoints http://127.0.0.1:2379,http://127.0.0.1:32379 member add infra5 --peer-urls=http://127.0.0.1:22381

~/etcd/3.5.0-alpha.0/etcd --name infra5 \
--listen-client-urls http://127.0.0.1:22379 \
--advertise-client-urls http://127.0.0.1:22379 \
--listen-peer-urls http://127.0.0.1:22381 \
--initial-advertise-peer-urls http://127.0.0.1:22381 \
--initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra5=http://127.0.0.1:22381,infra4=http://127.0.0.1:32381' --initial-cluster-state existing \
--enable-pprof > /home/chaochn/log/etcd/infra5.log 2>&1 &

## third
ETCDCTL_API=3 ~/etcd/release-3.4/etcdctl --endpoints http://127.0.0.1:22379,http://127.0.0.1:32379 member remove 8211f1d0f64f3269
ETCDCTL_API=3 ~/etcd/release-3.4/etcdctl --endpoints http://127.0.0.1:22379,http://127.0.0.1:32379 member add infra6 --peer-urls=http://127.0.0.1:12381

~/etcd/3.5.0-alpha.0/etcd --name infra6 \
--listen-client-urls http://127.0.0.1:2379 \
--advertise-client-urls http://127.0.0.1:2379 \
--listen-peer-urls http://127.0.0.1:12381 \
--initial-advertise-peer-urls http://127.0.0.1:12381 \
--initial-cluster-token etcd-cluster-1 --initial-cluster 'infra6=http://127.0.0.1:12381,infra5=http://127.0.0.1:22381,infra4=http://127.0.0.1:32381' --initial-cluster-state existing \
--enable-pprof > /home/chaochn/log/etcd/infra6.log 2>&1 &

Testing cluster version downgrade from 3.5 to 3.4

rm -rf /home/chaochn/log/etcd/
mkdir -p /home/chaochn/log/etcd

# create 3.5 cluster
~/etcd/3.5.0-alpha.0/etcd --name infra1 \
--listen-client-urls http://127.0.0.1:2379 \
--advertise-client-urls http://127.0.0.1:2379 \
--listen-peer-urls http://127.0.0.1:12380 \
--initial-advertise-peer-urls http://127.0.0.1:12380 \
--initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new \
--enable-pprof > /home/chaochn/log/etcd/infra1.log 2>&1 &

~/etcd/3.5.0-alpha.0/etcd --name infra2 \
--listen-client-urls http://127.0.0.1:22379 \
--advertise-client-urls http://127.0.0.1:22379 \
--listen-peer-urls http://127.0.0.1:22380 \
--initial-advertise-peer-urls http://127.0.0.1:22380 \
--initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new \
--enable-pprof > /home/chaochn/log/etcd/infra2.log 2>&1 &

~/etcd/3.5.0-alpha.0/etcd --name infra3 \
--listen-client-urls http://127.0.0.1:32379 \
--advertise-client-urls http://127.0.0.1:32379 \
--listen-peer-urls http://127.0.0.1:32380 \
--initial-advertise-peer-urls http://127.0.0.1:32380 \
--initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new \
--enable-pprof > /home/chaochn/log/etcd/infra3.log 2>&1 &

# downgrade to 3.4
## first
ETCDCTL_API=3 ~/etcd/release-3.4/etcdctl --endpoints http://127.0.0.1:2379,http://127.0.0.1:22379 member remove fd422379fda50e48
ETCDCTL_API=3 ~/etcd/release-3.4/etcdctl --endpoints http://127.0.0.1:2379,http://127.0.0.1:22379 member add infra4 --peer-urls=http://127.0.0.1:32381

~/etcd/release-3.4-with-rollback/etcd --name infra4 \
--listen-client-urls http://127.0.0.1:32379 \
--advertise-client-urls http://127.0.0.1:32379 \
--listen-peer-urls http://127.0.0.1:32381 \
--initial-advertise-peer-urls http://127.0.0.1:32381 \
--initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra4=http://127.0.0.1:32381' --initial-cluster-state existing \
--enable-pprof > /home/chaochn/log/etcd/infra4.log 2>&1 \
--unsafe-allow-cluster-version-downgrade=true &

## second
ETCDCTL_API=3 ~/etcd/release-3.4/etcdctl --endpoints http://127.0.0.1:2379,http://127.0.0.1:32379 member remove 91bc3c398fb3c146
ETCDCTL_API=3 ~/etcd/release-3.4/etcdctl --endpoints http://127.0.0.1:2379,http://127.0.0.1:32379 member add infra5 --peer-urls=http://127.0.0.1:22381

~/etcd/release-3.4-with-rollback/etcd --name infra5 \
--listen-client-urls http://127.0.0.1:22379 \
--advertise-client-urls http://127.0.0.1:22379 \
--listen-peer-urls http://127.0.0.1:22381 \
--initial-advertise-peer-urls http://127.0.0.1:22381 \
--initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra5=http://127.0.0.1:22381,infra4=http://127.0.0.1:32381' --initial-cluster-state existing \
--enable-pprof > /home/chaochn/log/etcd/infra5.log 2>&1 \
--unsafe-allow-cluster-version-downgrade=true &

## third
ETCDCTL_API=3 ~/etcd/release-3.4/etcdctl --endpoints http://127.0.0.1:22379,http://127.0.0.1:32379 member remove 8211f1d0f64f3269
ETCDCTL_API=3 ~/etcd/release-3.4/etcdctl --endpoints http://127.0.0.1:22379,http://127.0.0.1:32379 member add infra6 --peer-urls=http://127.0.0.1:12381

~/etcd/release-3.4-with-rollback/etcd --name infra6 \
--listen-client-urls http://127.0.0.1:2379 \
--advertise-client-urls http://127.0.0.1:2379 \
--listen-peer-urls http://127.0.0.1:12381 \
--initial-advertise-peer-urls http://127.0.0.1:12381 \
--initial-cluster-token etcd-cluster-1 --initial-cluster 'infra6=http://127.0.0.1:12381,infra5=http://127.0.0.1:22381,infra4=http://127.0.0.1:32381' --initial-cluster-state existing \
--enable-pprof > /home/chaochn/log/etcd/infra6.log 2>&1 \
--unsafe-allow-cluster-version-downgrade=true &

# clean up all the process and logs
sudo ps -aef | grep etcd | grep -v grep | awk '{print $2}' | xargs sudo kill
rm -rf ~/infra*
rm -rf /home/chaochn/log/etcd/*.log

@@ -2484,6 +2485,46 @@ func (s *EtcdServer) updateClusterVersion(ver string) {
)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well rename this method to updateClusterVersionV2, to be clear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also update the logging

"updating cluster version using v2 API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"setting up initial cluster version using v2 API"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in the next revision as your comments. PTAL @gyuho


if s.cluster.Version() == nil {
lg.Info(
"setting up initial cluster version",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... using v3 API

)
} else {
lg.Info(
"updating cluster version",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... using v3 API

@ptabor
Copy link
Contributor

ptabor commented May 17, 2021

I wonder how apply V2 is working. This lines worry me:

// remove v2 version set to avoid the conflict between v2 and v3.
if r.Path == membership.StoreClusterVersionKey() {
// return an empty response since there is no consumer.
return Response{}
}

looks like neither write to v2 store nor v3 store, or I'm missing something.
Where the data are stored is important in context of:

if c.be != nil {
c.version = clusterVersionFromBackend(c.lg, c.be)
c.members, c.removed = membersFromBackend(c.lg, c.be)
} else {
c.version = clusterVersionFromStore(c.lg, c.v2store)
c.members, c.removed = membersFromStore(c.lg, c.v2store)
}

@chaochn47
Copy link
Member Author

chaochn47 commented May 17, 2021

Think the contributor of https://github.com/etcd-io/etcd/pull/11427/files?file-filters%5B%5D=.go#diff-2a81e175edbcc322a52051676dc08cb497741e608dbefa4ac67d8e042cd902ccL94-L99 removed the v2 apply to store cluster version and added to v3 api.

I just revert back her change.

@codecov-commenter
Copy link

Codecov Report

Merging #12988 (e73facb) into main (932d42b) will decrease coverage by 16.40%.
The diff coverage is 100.00%.

❗ Current head e73facb differs from pull request most recent head 8e9b77a. Consider uploading reports for the commit 8e9b77a to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##             main   #12988       +/-   ##
===========================================
- Coverage   68.83%   52.42%   -16.41%     
===========================================
  Files         442      419       -23     
  Lines       34105    32753     -1352     
===========================================
- Hits        23477    17172     -6305     
- Misses       8709    13812     +5103     
+ Partials     1919     1769      -150     
Flag Coverage Δ
all 52.42% <100.00%> (-16.41%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/mvcc/backend/backend.go 80.39% <100.00%> (ø)
server/mvcc/backend/batch_tx.go 59.44% <100.00%> (ø)
server/mvcc/backend/tx_buffer.go 92.85% <100.00%> (+0.75%) ⬆️
raft/quorum/quorum.go 0.00% <0.00%> (-100.00%) ⬇️
raft/tracker/state.go 0.00% <0.00%> (-100.00%) ⬇️
client/v2/cancelreq.go 0.00% <0.00%> (-100.00%) ⬇️
client/v3/ordering/util.go 0.00% <0.00%> (-100.00%) ⬇️
client/pkg/v3/pathutil/path.go 0.00% <0.00%> (-100.00%) ⬇️
client/pkg/v3/tlsutil/cipher_suites.go 0.00% <0.00%> (-100.00%) ⬇️
client/pkg/v3/transport/sockopt_unix.go 0.00% <0.00%> (-100.00%) ⬇️
... and 215 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 932d42b...8e9b77a. Read the comment docs.

@gyuho
Copy link
Contributor

gyuho commented May 17, 2021

@ptabor

I wonder how apply V2 is working. This lines worry me:

Looks like we just return empty response, because there's no really consumer of this response given it's an internal API.

This change looks good to me. Any other concern?

@gyuho gyuho assigned gyuho and unassigned gyuho May 17, 2021
@gyuho gyuho added this to the etcd-v3.5 milestone May 17, 2021
@gyuho
Copy link
Contributor

gyuho commented May 17, 2021

/cc @YoyinZyc

@ptabor
Copy link
Contributor

ptabor commented May 17, 2021

@ptabor

I wonder how apply V2 is working. This lines worry me:

Looks like we just return empty response, because there's no really consumer of this response given it's an internal API.

This change looks good to me. Any other concern?

The actual storeV2 set happens in line 100. If we return in line 98, how do we store the value ?

return toResponse(a.store.Set(r.Path, r.Dir, r.Val, ttlOptions))

@gyuho
Copy link
Contributor

gyuho commented May 17, 2021

@ptabor In release 3.4, we don't call a.store.Create either. We just return success without persisting it.

The actual storeV2 set happens in line 100. If we return in line 98, how do we store the value ?

I believe that was the motivation we are migrating to v3 API for everything, right? v2 cluster state as it is now is not persistent. Unfortunately, those migration effort made some breaking changes.

For reference, this is release-3.4 branch:

if r.Path == membership.StoreClusterVersionKey() {
if a.cluster != nil {
a.cluster.SetVersion(semver.Must(semver.NewVersion(r.Val)), api.UpdateCapability)
}
// return an empty response since there is no consumer.
return Response{}
}
return toResponse(a.store.Set(r.Path, r.Dir, r.Val, ttlOptions))

@gyuho gyuho merged commit 1af7cb2 into etcd-io:main May 17, 2021
@ptabor
Copy link
Contributor

ptabor commented May 17, 2021

Thanks.

I missed that:

a.cluster.SetVersion(semver.Must(semver.NewVersion(r.Val)), api.UpdateCapability, membership.ApplyBoth)
``` was added back. 

ptabor added a commit to ptabor/etcd that referenced this pull request May 18, 2021
During review of:  etcd-io#12988 spotted
that PUT is actially writing to v3-backend.
If we are replaying WAL log, it might happened that backend's
applied_index is > than the WAL's log entry. In such situation we should
skip applying on backend V3.
I think both the methods (setVersion, setMembersAttributes) are in
practice idempotent so its not that 'serious' problem, but for
formal correctness adding the proper checks.
ptabor added a commit to ptabor/etcd that referenced this pull request May 18, 2021
During review of:  etcd-io#12988 spotted
that PUT is actially writing to v3-backend.
If we are replaying WAL log, it might happened that backend's
applied_index is > than the WAL's log entry. In such situation we should
skip applying on backend V3.
I think both the methods (setVersion, setMembersAttributes) are in
practice idempotent so its not that 'serious' problem, but for
formal correctness adding the proper checks.
gyuho pushed a commit to gyuho/etcd that referenced this pull request May 19, 2021
During review of:  etcd-io#12988 spotted
that PUT is actially writing to v3-backend.
If we are replaying WAL log, it might happened that backend's
applied_index is > than the WAL's log entry. In such situation we should
skip applying on backend V3.
I think both the methods (setVersion, setMembersAttributes) are in
practice idempotent so its not that 'serious' problem, but for
formal correctness adding the proper checks.
@serathius serathius mentioned this pull request Jun 10, 2021
13 tasks
@zaisongz
Copy link

zaisongz commented Aug 20, 2021

How to enable the downgrade by API, could you please share an example?@chaochn47

Does v3.5.0 already support this functionality?

@chaochn47
Copy link
Member Author

@zaisongz I was patching this PR #13022 to local etcd code base to verify downgrade and upgrade correctness.

This is the new flag --unsafe-allow-cluster-version-downgrade=true I introduced.

The official downgrade support should be ready by 3.6 based on #13168

@chaochn47 chaochn47 deleted the updateClusterVersionV2 branch October 23, 2021 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3.4 to 3.5 upgrade may panic
5 participants