Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Duplicate IP issues #2105

Merged
merged 3 commits into from
Mar 9, 2018
Merged

Fix for Duplicate IP issues #2105

merged 3 commits into from
Mar 9, 2018

Conversation

abhi
Copy link
Contributor

@abhi abhi commented Mar 8, 2018

While investigating duplicate IP issue in IPAM library the following bugs were found:

  1. Race condition: IPAM library seems to assume the present of datastore to achieve atomicity. However IPAM library can be used without datastore , for eg swarmkit can use the ipam by setting the datastore to nil. The race condition causes corruption of the bitsequence which might lead to duplicate IPs by releasing non deallocated IP bits and also cause IP leaks due to certain bit being set even though it was not supposed to be allocated.
  2. Free bit logic: The logic in getAvailableBit checks for the presence of a free bit from a start bit in a block. However if the start bit is right after the last available bit, it will return the last bit in the block as the next available bit even though it is not free.
    For eg:
    Consider the block if curr:16 and sequence:0xfffeffff, count:10, next:{sequence:0x0fffffff, count:6} then the getAvailableBit(16) would return 31 as the next available bit and we will end up with duplicate IPs/vxlanids etc.
  3. Byte Offset Calculations
  4. Another Byte Offset Calculations (Regression): Similar to the same example in 2) we were moving on to the next block altogether instead of looking at the next instance in the same block if it is available.
  5. Concurrent Map Access

Tests are added to verify the above scenrios

Signed-off-by: Abhinandan Prativadi abhi@docker.com

abhi added 2 commits March 8, 2018 11:19
This commit contains fixes for duplicate IP with 3 issues addressed:
1) Race condition when datastore is not present in cases like swarmkit
2) Byte Offset calculation depending on where the start of the bit
   in the bitsequence is, the offset was adding more bytes to the offset
   when the start of the bit is in the middle of one of the instances in
   a block
3) Finding the available bit was returning the last bit in the curent instance in
   a block if the block is not full and the current bit is after the last
   available bit.

Signed-off-by: Abhinandan Prativadi <abhi@docker.com>
This commit fixes panic due to concurrent map access

Signed-off-by: Abhinandan Prativadi <abhi@docker.com>
@codecov-io
Copy link

codecov-io commented Mar 8, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@3aca383). Click here to learn what that means.
The diff coverage is 84.84%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #2105   +/-   ##
=========================================
  Coverage          ?   40.43%           
=========================================
  Files             ?      139           
  Lines             ?    22376           
  Branches          ?        0           
=========================================
  Hits              ?     9047           
  Misses            ?    12000           
  Partials          ?     1329
Impacted Files Coverage Δ
ipam/allocator.go 70.84% <75%> (ø)
bitseq/sequence.go 83.33% <86.2%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3aca383...c6843ee. Read the comment docs.

@fcrisciani
Copy link

ping @aboch

@@ -498,24 +506,40 @@ func (h *Handle) UnmarshalJSON(data []byte) error {
func getFirstAvailable(head *sequence, start uint64) (uint64, uint64, error) {
// Find sequence which contains the start bit
byteStart, bitStart := ordinalToPos(start)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a reminder for myself, byteStart and bitStart here are relative to the first byte of the block where the ordinal start falls into

@@ -498,24 +506,40 @@ func (h *Handle) UnmarshalJSON(data []byte) error {
func getFirstAvailable(head *sequence, start uint64) (uint64, uint64, error) {
// Find sequence which contains the start bit
byteStart, bitStart := ordinalToPos(start)
current, _, _, inBlockBytePos := findSequence(head, byteStart)

current, _, precBlocks, inBlockBytePos := findSequence(head, byteStart)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current is the pointer to the block where byteStart fall into
precBlocks, is the number of blocks within the same block where the byteStart fall into
example: head->0xFFFF00FF|10 and my byte start is 5 because my ordinal is 42, in this case precBlocks is 1, indicating that my ordinal falls in the second compressed block

goto next
}
if err != nil {
// There are some more instances of the same block, so add the offset

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only possible in the case of serial allocation, where the start can be with in the block itself

Copy link

@fcrisciani fcrisciani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ddebroy ddebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First set of comments. Still going through the actual RLE changes.

if store != nil {
h.Unlock() // The lock is acquired in the GetObject
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears store.GetObject will modify h (through the SetValue and SetIndex calls). If so, shouldn't we keep h locked while that is happening? Also, it appears the lock acquired in GetObject is a datastore lock rather than the object's lock. Is h.Unlock() needed and correct before the GetObject call invoked on h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the store logic works in libnetwork is , the in memory data structure gets overloaded by the store data. So there need not be lock going into the GetObjec

if store != nil {
h.Unlock() // The lock is acquired in the GetObject
if err := store.GetObject(datastore.Key(h.Key()...), h); err != nil && err != datastore.ErrKeyNotFound {
Copy link
Contributor

@ddebroy ddebroy Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h.unlock() will be needed at this point since we are returning. Potentially in a later patch, do you think the code can be restructured a little bit to just defer unlocks (instead of the unlock/lock sequences) after grabbing a lock on h before the for loop. I.e.

h.lock()
defer h.unlock()
for {
    .
    .
    .
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I hate the sequence too. But because of the way we handle differently for swarmkit and native, its kind of needed like this. I will work on the refactor next.

if err := nh.writeToStore(); err != nil {
if _, ok := err.(types.RetryError); !ok {
return ret, fmt.Errorf("internal failure while setting the bit: %v", err)
if h.store != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check h.store again at this point? It seems we already check earlier in the function. If we lock h outside the for early in the function, I think h.store should not become nil mid-way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to differentiate between a store logic and a non store logic codepath. I need to unlock before going to write store it will lead to deadlock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it regarding the logic to differentiate store/non-store codepaths.

For the locks, it looks like we are calling nh.writeToStore() below which will lock nh's mutex and not that of h, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep thats right. All the more reason not to have a lock around the operation right ? I retained the logic needed for store code path the way it was.

This commit contains test cases to verify the changes and to
solidify the library.

Signed-off-by: Abhinandan Prativadi <abhi@docker.com>
@abhi
Copy link
Contributor Author

abhi commented Mar 9, 2018

@ddebroy added https://github.com/docker/libnetwork/pull/2105/files#diff-3f7f9d2c9921912aae5e59507c3ba643R199 to test invalidPos

Copy link
Contributor

@ddebroy ddebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexafshar
Copy link

alexafshar commented Apr 28, 2018

@abhi @fcrisciani
may the underlying cause of this dup ip issue (ipam race condition, free bit logic, bit offset calc ), also manifest itself in the form of "rouge ip" situation as well? i.e. a "rouge ip" gets assigned to a replica that is not routable. This ip might have been assigned before. cluster dns thinks its routable on a nslookup. However docker inspect on the service doesn't show it.

docker inspect $(docker ps | grep some-xyz-service | awk '{print $1}') | grep IPv4
IPv4Address: 10.1.107.85
IPv4Address: 10.3.37.251
IPv4Address: 10.1.107.74

then doing a nslookup from a running replica for some-xyz-service, shows an extra IP that does not show on a docker inspect of the service

docker exec -it b9c424e28055 nslookup some-xyz-service
;; Truncated, retrying in TCP mode.
Server:                   127.0.0.11
Address:                127.0.0.11#53

Non-authoritative answer:
Name:   some-xyz-service
Address: 10.1.107.85
Name:   some-xyz-service
**Address: 10.0.58.156**      <---------   not routable
Name:   some-xyz-service
Address: 10.3.37.251
Name:  some-xyz-service
Address: 10.1.107.74

@fcrisciani
Copy link

@alexafshar typically I would say no. But this fix will be effective on a network that is not already corrupted. so the suggestion is to check that the network does not show duplicates and if so then the best is to start with a new network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants