Fix for Duplicate IP issues #2105

abhi · 2018-03-08T19:06:03Z

While investigating duplicate IP issue in IPAM library the following bugs were found:

Race condition: IPAM library seems to assume the present of datastore to achieve atomicity. However IPAM library can be used without datastore , for eg swarmkit can use the ipam by setting the datastore to nil. The race condition causes corruption of the bitsequence which might lead to duplicate IPs by releasing non deallocated IP bits and also cause IP leaks due to certain bit being set even though it was not supposed to be allocated.
Free bit logic: The logic in getAvailableBit checks for the presence of a free bit from a start bit in a block. However if the start bit is right after the last available bit, it will return the last bit in the block as the next available bit even though it is not free.
For eg:
Consider the block if curr:16 and sequence:0xfffeffff, count:10, next:{sequence:0x0fffffff, count:6} then the getAvailableBit(16) would return 31 as the next available bit and we will end up with duplicate IPs/vxlanids etc.
Byte Offset Calculations
Another Byte Offset Calculations (Regression): Similar to the same example in 2) we were moving on to the next block altogether instead of looking at the next instance in the same block if it is available.
Concurrent Map Access

Tests are added to verify the above scenrios

Signed-off-by: Abhinandan Prativadi abhi@docker.com

This commit contains fixes for duplicate IP with 3 issues addressed: 1) Race condition when datastore is not present in cases like swarmkit 2) Byte Offset calculation depending on where the start of the bit in the bitsequence is, the offset was adding more bytes to the offset when the start of the bit is in the middle of one of the instances in a block 3) Finding the available bit was returning the last bit in the curent instance in a block if the block is not full and the current bit is after the last available bit. Signed-off-by: Abhinandan Prativadi <abhi@docker.com>

This commit fixes panic due to concurrent map access Signed-off-by: Abhinandan Prativadi <abhi@docker.com>

codecov-io · 2018-03-08T19:40:26Z

Codecov Report

❗ No coverage uploaded for pull request base (master@3aca383). Click here to learn what that means.
The diff coverage is 84.84%.

@@            Coverage Diff            @@
##             master    #2105   +/-   ##
=========================================
  Coverage          ?   40.43%           
=========================================
  Files             ?      139           
  Lines             ?    22376           
  Branches          ?        0           
=========================================
  Hits              ?     9047           
  Misses            ?    12000           
  Partials          ?     1329

Impacted Files	Coverage Δ
ipam/allocator.go	`70.84% <75%> (ø)`
bitseq/sequence.go	`83.33% <86.2%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3aca383...c6843ee. Read the comment docs.

fcrisciani · 2018-03-08T20:22:48Z

ping @aboch

fcrisciani · 2018-03-08T20:34:25Z

bitseq/sequence.go

@@ -498,24 +506,40 @@ func (h *Handle) UnmarshalJSON(data []byte) error {
 func getFirstAvailable(head *sequence, start uint64) (uint64, uint64, error) {
 	// Find sequence which contains the start bit
 	byteStart, bitStart := ordinalToPos(start)


as a reminder for myself, byteStart and bitStart here are relative to the first byte of the block where the ordinal start falls into

fcrisciani · 2018-03-08T20:36:58Z

bitseq/sequence.go

@@ -498,24 +506,40 @@ func (h *Handle) UnmarshalJSON(data []byte) error {
 func getFirstAvailable(head *sequence, start uint64) (uint64, uint64, error) {
 	// Find sequence which contains the start bit
 	byteStart, bitStart := ordinalToPos(start)
-	current, _, _, inBlockBytePos := findSequence(head, byteStart)
-
+	current, _, precBlocks, inBlockBytePos := findSequence(head, byteStart)


current is the pointer to the block where byteStart fall into
precBlocks, is the number of blocks within the same block where the byteStart fall into
example: head->0xFFFF00FF|10 and my byte start is 5 because my ordinal is 42, in this case precBlocks is 1, indicating that my ordinal falls in the second compressed block

fcrisciani · 2018-03-08T20:38:39Z

bitseq/sequence.go

+				goto next
+			}
+			if err != nil {
+				// There are some more instances of the same block, so add the offset


this is only possible in the case of serial allocation, where the start can be with in the block itself

fcrisciani

LGTM

ddebroy

First set of comments. Still going through the actual RLE changes.

ddebroy · 2018-03-09T00:05:27Z

bitseq/sequence.go

 		if store != nil {
+			h.Unlock() // The lock is acquired in the GetObject


It appears store.GetObject will modify h (through the SetValue and SetIndex calls). If so, shouldn't we keep h locked while that is happening? Also, it appears the lock acquired in GetObject is a datastore lock rather than the object's lock. Is h.Unlock() needed and correct before the GetObject call invoked on h?

The way the store logic works in libnetwork is , the in memory data structure gets overloaded by the store data. So there need not be lock going into the GetObjec

ddebroy · 2018-03-09T00:07:16Z

bitseq/sequence.go

 		if store != nil {
+			h.Unlock() // The lock is acquired in the GetObject
 			if err := store.GetObject(datastore.Key(h.Key()...), h); err != nil && err != datastore.ErrKeyNotFound {


h.unlock() will be needed at this point since we are returning. Potentially in a later patch, do you think the code can be restructured a little bit to just defer unlocks (instead of the unlock/lock sequences) after grabbing a lock on h before the for loop. I.e.

h.lock() defer h.unlock() for { . . . }

Yes I hate the sequence too. But because of the way we handle differently for swarmkit and native, its kind of needed like this. I will work on the refactor next.

ddebroy · 2018-03-09T00:22:19Z

bitseq/sequence.go

-		if err := nh.writeToStore(); err != nil {
-			if _, ok := err.(types.RetryError); !ok {
-				return ret, fmt.Errorf("internal failure while setting the bit: %v", err)
+		if h.store != nil {


Do we need to check h.store again at this point? It seems we already check earlier in the function. If we lock h outside the for early in the function, I think h.store should not become nil mid-way?

This is to differentiate between a store logic and a non store logic codepath. I need to unlock before going to write store it will lead to deadlock.

Got it regarding the logic to differentiate store/non-store codepaths.

For the locks, it looks like we are calling nh.writeToStore() below which will lock nh's mutex and not that of h, right?

Yep thats right. All the more reason not to have a lock around the operation right ? I retained the logic needed for store code path the way it was.

This commit contains test cases to verify the changes and to solidify the library. Signed-off-by: Abhinandan Prativadi <abhi@docker.com>

abhi · 2018-03-09T17:45:55Z

@ddebroy added https://github.com/docker/libnetwork/pull/2105/files#diff-3f7f9d2c9921912aae5e59507c3ba643R199 to test invalidPos

ddebroy

LGTM

alexafshar · 2018-04-28T14:21:58Z

@abhi @fcrisciani
may the underlying cause of this dup ip issue (ipam race condition, free bit logic, bit offset calc ), also manifest itself in the form of "rouge ip" situation as well? i.e. a "rouge ip" gets assigned to a replica that is not routable. This ip might have been assigned before. cluster dns thinks its routable on a nslookup. However docker inspect on the service doesn't show it.

docker inspect $(docker ps | grep some-xyz-service | awk '{print $1}') | grep IPv4
IPv4Address: 10.1.107.85
IPv4Address: 10.3.37.251
IPv4Address: 10.1.107.74

then doing a nslookup from a running replica for some-xyz-service, shows an extra IP that does not show on a docker inspect of the service

docker exec -it b9c424e28055 nslookup some-xyz-service
;; Truncated, retrying in TCP mode.
Server:                   127.0.0.11
Address:                127.0.0.11#53

Non-authoritative answer:
Name:   some-xyz-service
Address: 10.1.107.85
Name:   some-xyz-service
**Address: 10.0.58.156**      <---------   not routable
Name:   some-xyz-service
Address: 10.3.37.251
Name:  some-xyz-service
Address: 10.1.107.74

fcrisciani · 2018-04-30T16:51:31Z

@alexafshar typically I would say no. But this fix will be effective on a network that is not already corrupted. so the suggestion is to check that the network does not show duplicates and if so then the best is to start with a new network.

abhi requested a review from fcrisciani March 8, 2018 19:06

abhi mentioned this pull request Mar 8, 2018

IPAM tests #2104

Merged

abhi force-pushed the bitseq branch from 7d4b47e to 7a9c339 Compare March 8, 2018 19:12

abhi requested a review from ddebroy March 8, 2018 19:12

abhi added 2 commits March 8, 2018 11:19

Fixing concurrent map access

48668c3

This commit fixes panic due to concurrent map access Signed-off-by: Abhinandan Prativadi <abhi@docker.com>

abhi force-pushed the bitseq branch from 7a9c339 to 4b12122 Compare March 8, 2018 19:19

abhi force-pushed the bitseq branch from 4b12122 to 233d58d Compare March 8, 2018 19:43

fcrisciani reviewed Mar 8, 2018

View reviewed changes

fcrisciani approved these changes Mar 8, 2018

View reviewed changes

ddebroy reviewed Mar 9, 2018

View reviewed changes

IPAM and Bitseq test cases

c6843ee

This commit contains test cases to verify the changes and to solidify the library. Signed-off-by: Abhinandan Prativadi <abhi@docker.com>

abhi force-pushed the bitseq branch from 233d58d to c6843ee Compare March 9, 2018 17:24

ddebroy approved these changes Mar 9, 2018

View reviewed changes

fcrisciani merged commit 834b391 into moby:master Mar 9, 2018

thaJeztah mentioned this pull request Mar 12, 2018

[Backport 17.06] Fix for Duplicate IP issues #2106

Merged

fcrisciani mentioned this pull request Mar 12, 2018

Problem with overlay network and DNS resolution between containers moby/moby#30487

Closed

thaJeztah mentioned this pull request Mar 13, 2018

Fix automatic removal of ingress sandbox when last service leaves moby/moby#36538

Merged

ctelfer mentioned this pull request Jul 3, 2018

ingress-sbox containers with the same ip addresses blocking ingress traffic moby/moby#36949

Open

yousong mentioned this pull request May 5, 2022

1.12.x: backport libnetwork fixes for dup ip aliyun-fc/docker#7

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for Duplicate IP issues #2105

Fix for Duplicate IP issues #2105

abhi commented Mar 8, 2018

codecov-io commented Mar 8, 2018 •

edited

Loading

fcrisciani commented Mar 8, 2018

fcrisciani Mar 8, 2018

fcrisciani Mar 8, 2018

fcrisciani Mar 8, 2018

fcrisciani left a comment

ddebroy left a comment

ddebroy Mar 9, 2018

abhi Mar 9, 2018

ddebroy Mar 9, 2018 •

edited

Loading

abhi Mar 9, 2018

ddebroy Mar 9, 2018

abhi Mar 9, 2018

ddebroy Mar 9, 2018

abhi Mar 9, 2018

abhi commented Mar 9, 2018

ddebroy left a comment

alexafshar commented Apr 28, 2018 •

edited

Loading

fcrisciani commented Apr 30, 2018

		if store != nil {
		h.Unlock() // The lock is acquired in the GetObject

Fix for Duplicate IP issues #2105

Fix for Duplicate IP issues #2105

Conversation

abhi commented Mar 8, 2018

codecov-io commented Mar 8, 2018 • edited Loading

Codecov Report

fcrisciani commented Mar 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fcrisciani left a comment

Choose a reason for hiding this comment

ddebroy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ddebroy Mar 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhi commented Mar 9, 2018

ddebroy left a comment

Choose a reason for hiding this comment

alexafshar commented Apr 28, 2018 • edited Loading

fcrisciani commented Apr 30, 2018

codecov-io commented Mar 8, 2018 •

edited

Loading

ddebroy Mar 9, 2018 •

edited

Loading

alexafshar commented Apr 28, 2018 •

edited

Loading