You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 20, 2024. It is now read-only.
When IPAM entries are updated locally and broadcast to the ring there can arise inconsistency in IPAM entries resulting in conflicts. #1962 broadly addresses this for all the cases.
This bug report is specific sequence resulting in IPAM entry ending in-consistent where both version are same with different owning peer
What happened?
Going to describe problem with the logs from the cluster where this is issue is seen.
its a 3 node cluster with de:4f:4d:90:6b:e2, 42:6d:29:3d:b2:d4 and 3e:06:93:0a:8f:fd as peers
de:4f:4d:90:6b:e2 has below view of token 10.34.0.0
Here is the snip of sequence of event that led to this consistent state
snip from the logs of de:4f:4d:90:6b:e2
DEBU: 2019/04/15 11:28:49.370584 [ring de:4f:4d:90:6b:e2]: Merge token=10.34.0.0 mine.Peer=3e:06:93:0a:8f:fd theirs.Peer=3e:06:93:0a:8f:fd mine.Version=1 theirs.Version=1
DEBU: 2019/04/15 11:28:54.584003 [ring de:4f:4d:90:6b:e2]: Merge token=10.34.0.0 mine.Peer=3e:06:93:0a:8f:fd theirs.Peer=42:6d:29:3d:b2:d4 mine.Version=1 theirs.Version=3
DEBU: 2019/04/15 11:29:19.381484 [ring de:4f:4d:90:6b:e2]: Merge token=10.34.0.0 mine.Peer=42:6d:29:3d:b2:d4 theirs.Peer=3e:06:93:0a:8f:fd mine.Version=3 theirs.Version=4
DEBU: 2019/04/15 11:29:48.223696 [ring de:4f:4d:90:6b:e2]: Transfer token=10.34.0.0 from=3e:06:93:0a:8f:fd to=de:4f:4d:90:6b:e2 version=5
INFO: 2019/04/15 11:29:48.238566 ->[192.168.1.102:54975|3e:06:93:0a:8f:fd(weave-node2)]: connection shutting down due to error: Inconsistent entries for 10.34.0.0: owned by de:4f:4d:90:6b:e2 but incoming message says 3e:06:93:0a:8f:fd
INFO: 2019/04/15 11:29:49.716502 ->[192.168.1.101:6783|42:6d:29:3d:b2:d4(weave-node1)]: connection shutting down due to error: Inconsistent entries for 10.34.0.0: owned by de:4f:4d:90:6b:e2 but incoming message says 3e:06:93:0a:8f:fd
INFO: 2019/04/15 11:29:50.468385 ->[192.168.1.101:33901|42:6d:29:3d:b2:d4(weave-node1)]: connection shutting down due to error: Inconsistent entries for 10.34.0.0: owned by de:4f:4d:90:6b:e2 but incoming message says 3e:06:93:0a:8f:fd
INFO: 2019/04/15 11:29:51.372729 ->[192.168.1.101:6783|42:6d:29:3d:b2:d4(weave-node1)]: connection shutting down due to error: Inconsistent entries for 10.34.0.0: owned by de:4f:4d:90:6b:e2 but incoming message says 3e:06:93:0a:8f:fd
snip from the logs of 42:6d:29:3d:b2:d4
DEBU: 2019/04/15 11:28:19.790113 [ring 42:6d:29:3d:b2:d4]: Merge token=10.34.0.0 mine.Peer=3e:06:93:0a:8f:fd theirs.Peer=3e:06:93:0a:8f:fd mine.Version=1 theirs.Version=1
DEBU: 2019/04/15 11:28:54.626790 [ring 42:6d:29:3d:b2:d4]: Transfer token=10.34.0.0 from=3e:06:93:0a:8f:fd to=42:6d:29:3d:b2:d4 version=2
DEBU: 2019/04/15 11:28:54.628956 [ring 42:6d:29:3d:b2:d4]: ReportFree token=10.34.0.0 peer=42:6d:29:3d:b2:d4 version=3
DEBU: 2019/04/15 11:28:58.265381 [ring 42:6d:29:3d:b2:d4]: Merge token=10.34.0.0 mine.Peer=42:6d:29:3d:b2:d4 theirs.Peer=3e:06:93:0a:8f:fd mine.Version=3 theirs.Version=1
DEBU: 2019/04/15 11:29:12.093926 [allocator 42:6d:29:3d:b2:d4]: Giving range 10.34.0.0-10.34.0.0 to 3e:06:93:0a:8f:fd
DEBU: 2019/04/15 11:29:12.093989 Updating ring token=10.34.0.0 current peer=3e:06:93:0a:8f:fd updated peer=3e:06:93:0a:8f:fd new version=4
DEBU: 2019/04/15 11:29:19.773565 [ring 42:6d:29:3d:b2:d4]: Merge token=10.34.0.0 mine.Peer=3e:06:93:0a:8f:fd theirs.Peer=3e:06:93:0a:8f:fd mine.Version=4 theirs.Version=4
DEBU: 2019/04/15 11:29:41.882462 [ring 42:6d:29:3d:b2:d4]: Merge token=10.34.0.0 mine.Peer=3e:06:93:0a:8f:fd theirs.Peer=3e:06:93:0a:8f:fd mine.Version=4 theirs.Version=5
INFO: 2019/04/15 11:29:48.265267 ->[192.168.1.100:47170|de:4f:4d:90:6b:e2(weave-master)]: connection shutting down due to error: Inconsistent entries for 10.34.0.0: owned by 3e:06:93:0a:8f:fd but incoming message says de:4f:4d:90:6b:e2
de:4f:4d:90:6b:e2 bumped version of 10.34.0.0 to 5 DEBU: 2019/04/15 11:29:48.223696 [ring de:4f:4d:90:6b:e2]: Transfer token=10.34.0.0 from=3e:06:93:0a:8f:fd to=de:4f:4d:90:6b:e2 version=5
simultaneously 42:6d:29:3d:b2:d4 merged ring update from 3e:06:93:0a:8f:fd which bumbed version to 5: DEBU: 2019/04/15 11:29:41.882462 [ring 42:6d:29:3d:b2:d4]: Merge token=10.34.0.0 mine.Peer=3e:06:93:0a:8f:fd theirs.Peer=3e:06:93:0a:8f:fd mine.Version=4 theirs.Version=5
resulting in entried with same version but different owning peers.
Looking at the logs of the peer 3e:06:93:0a:8f:fd why it increased version from 4 to 5 this is what is seen in the logs
So DEBU: 2019/04/15 11:29:12.155554 [ring 3e:06:93:0a:8f:fd]: ReportFree token=10.34.0.0 peer=3e:06:93:0a:8f:fd version=5 bumped the version.
Not sure why version should be bumped up when a token is reported to be free.
How to reproduce it?
Have a cluster with node A and node B, and keep adding node C and deleting it from the cluster. You will end up with IPAM inconsistencies once in a while.
The text was updated successfully, but these errors were encountered:
(noting that the "transfer" at 11:29:48.223696 is from an rmpeer operation, i.e. it is forced)
ReportFree bumps the version number so peers eventually find out the new information via gossip, but it does not broadcast because the free space update is not urgent.
If the space was in use after a transfer then again it would get its version bumped by ReportFree, and that update would be accepted by everyone on the network because it has a higher version. It goes wrong because, after a reclaim operation, the local peer typically doesn't use that space; it already has lots of space it is using.
I propose that rmpeer bump the version number by a lot more than 1, e.g. by 100, which will overrule any effects like here where one update got missed.
One more thought: we could wait a bit before reclaiming, to allow updates like this to traverse the network. In theory the information should get everywhere within log(number of nodes) gossip intervals.
But "bump the number by a lot" is worth doing even if we wait a bit.
I propose that rmpeer bump the version number by a lot more than 1, e.g. by 100, which will overrule any effects like here where one update got missed.
Tried it. Works perfectly. I no longer saw this issue.
What you expected to happen?
When IPAM entries are updated locally and broadcast to the ring there can arise inconsistency in IPAM entries resulting in conflicts. #1962 broadly addresses this for all the cases.
This bug report is specific sequence resulting in IPAM entry ending in-consistent where both version are same with different owning peer
What happened?
Going to describe problem with the logs from the cluster where this is issue is seen.
its a 3 node cluster with
de:4f:4d:90:6b:e2
,42:6d:29:3d:b2:d4
and3e:06:93:0a:8f:fd
as peersde:4f:4d:90:6b:e2
has below view of token 10.34.0.042:6d:29:3d:b2:d4
has below view of token 10.34.0.0this results in both nodes not connecting to each other due to the conflict
de:4f:4d:90:6b:e2
fails to connect to42:6d:29:3d:b2:d4
similarly
42:6d:29:3d:b2:d4
fails to connect tode:4f:4d:90:6b:e2
Here is the snip of sequence of event that led to this consistent state
snip from the logs of
de:4f:4d:90:6b:e2
snip from the logs of
42:6d:29:3d:b2:d4
de:4f:4d:90:6b:e2
bumped version of 10.34.0.0 to 5DEBU: 2019/04/15 11:29:48.223696 [ring de:4f:4d:90:6b:e2]: Transfer token=10.34.0.0 from=3e:06:93:0a:8f:fd to=de:4f:4d:90:6b:e2 version=5
simultaneously
42:6d:29:3d:b2:d4
merged ring update from3e:06:93:0a:8f:fd
which bumbed version to 5:DEBU: 2019/04/15 11:29:41.882462 [ring 42:6d:29:3d:b2:d4]: Merge token=10.34.0.0 mine.Peer=3e:06:93:0a:8f:fd theirs.Peer=3e:06:93:0a:8f:fd mine.Version=4 theirs.Version=5
resulting in entried with same version but different owning peers.
Looking at the logs of the peer
3e:06:93:0a:8f:fd
why it increased version from 4 to 5 this is what is seen in the logsSo
DEBU: 2019/04/15 11:29:12.155554 [ring 3e:06:93:0a:8f:fd]: ReportFree token=10.34.0.0 peer=3e:06:93:0a:8f:fd version=5
bumped the version.Not sure why version should be bumped up when a token is reported to be free.
How to reproduce it?
Have a cluster with node A and node B, and keep adding node C and deleting it from the cluster. You will end up with IPAM inconsistencies once in a while.
The text was updated successfully, but these errors were encountered: