Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service discovery race on serviceBindings delete. Bug on IP reuse #1808

Merged
merged 6 commits into from
Jun 18, 2017

Conversation

fcrisciani
Copy link

Found an issue where the deletion of the entry from serviceBindings map inside the rmServiceBinding was creating a possible race with a new addServiceBindings.
The deletion from the map was allowing a new go routine to create a completely new service object and race with the final part of the rmServiceBinding.

Another issue identified was the possible reuse of an IP from a service in the svcRecords map. This table is keyed by network ID, so services if they are reusing the same name but different service id cannot be distinguished when they are creating or deleting entries. In a perturbance scenario where a service is updated, notification coming from endpoints that are remote can be delayed and cause state corruption on the new service with local endpoints. To address this issue, move all the map to SetMatrix that will allow to handle gracefully the transition between services.
Added a unit test for this case where 2 services with same name share same IP, the test validates that the deletion of one service does not compromise the service lookup.

@@ -1376,19 +1381,31 @@ func (n *network) getSvcRecords(ep *endpoint) []etchosts.Record {

n.ctrlr.Lock()
defer n.ctrlr.Unlock()
Copy link
Author

@fcrisciani fcrisciani Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if makes sense here keep the controller lock only for the resolution, have to check. The SetMatrix is already go routine safe and should be enough

This was referenced Jun 16, 2017
Copy link
Contributor

@mavenugo mavenugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I take some more time reviewing the changes, pls take a look at the comments.

Also I think @sanimej should take a look at this and comment as well.

@@ -1228,75 +1236,76 @@ func (n *network) updateSvcRecord(ep *endpoint, localEps []*endpoint, isAdd bool
ipv6 = iface.AddressIPv6().IP
}

serviceID := ep.svcID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this method is a generic method for both containers and services, one cannot assume that serviceID will be a valid one. Hence, I think the code change serviceID = ep.ID() should be brought here if serviceID == "".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will move up

}

// Add service name to vip in DNS, if vip is valid. Otherwise resort to DNS RR
if len(vip) == 0 {
n.(*network).addSvcRecords(eID, svcName, ip, nil, false, method)
n.(*network).addSvcRecords(eID, svcName, eID, ip, nil, false, method)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is DNS-RR treated as an attachable container use-case, while the svcID can still be valid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I add the same logic at the top:
var serviceID string
if svcID == "" {
serviceID = eID
}
WDYT?

}
}

// If we are doing DNS RR delete the endpoint IP from DNS record right away.
if !multipleEntries && len(vip) == 0 {
n.(*network).deleteSvcRecords(eID, svcName, ip, nil, false, method)
n.(*network).deleteSvcRecords(eID, svcName, eID, ip, nil, false, method)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Hosts: h,
IP: ip[0].String(),
Hosts: k,
IP: mapEntryList[0].(svcMapEntry).ip,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to spend a bit more time reviewing these changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if helps but in words:
line 1389: get all the names in the dns
line 1391: for each name in the dns
line 1396: get all the ip associated to it as a Set
line 1406: Create a record with the first IP that is in the Set

Flavio Crisciani added 2 commits June 17, 2017 16:34
The SetMatrix is a generic data structure, so the description
should not be tight to any specific use

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
- Added logic to handle name reuse from different services
- Moved the deletion from the serviceBindings map at the end
  of the rmServiceBindings body to avoid race with new services

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Flavio Crisciani added 3 commits June 17, 2017 16:58
Use the locker to avoid the race between the network
deletion and new endpoints being created

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Allow the cleanupServicebindings to take care of the service discovery
cleanup. Also avoid to trigger the cleanup for each endpoint from an SD
point of view
LB and SD will be separated in the future

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
If there is an error locally guarantee that the delete entry
on network DB is still honored

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Copy link
Contributor

@mavenugo mavenugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mavenugo mavenugo merged commit 6426d1e into moby:master Jun 18, 2017
@fcrisciani fcrisciani deleted the sd-race-network branch June 18, 2017 15:11
fcrisciani pushed a commit to fcrisciani/libnetwork that referenced this pull request Jun 20, 2017
…by#1808)

* Correct SetMatrix documentation

The SetMatrix is a generic data structure, so the description
should not be tight to any specific use

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>

* Service Discovery reuse name and serviceBindings deletion

- Added logic to handle name reuse from different services
- Moved the deletion from the serviceBindings map at the end
  of the rmServiceBindings body to avoid race with new services

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>

* Avoid race on network cleanup

Use the locker to avoid the race between the network
deletion and new endpoints being created

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>

* CleanupServiceBindings to clean the SD records

Allow the cleanupServicebindings to take care of the service discovery
cleanup. Also avoid to trigger the cleanup for each endpoint from an SD
point of view
LB and SD will be separated in the future

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>

* Addressed comments

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>

* NetworkDB deleteEntry has to happen

If there is an error locally guarantee that the delete entry
on network DB is still honored

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
(cherry picked from commit 6426d1e)
@andrewhsu andrewhsu mentioned this pull request Jun 20, 2017
6 tasks
tiborvass pushed a commit to tiborvass/docker-ce that referenced this pull request Jun 20, 2017
Brings in this pull request:
- moby/libnetwork#1808

Signed-off-by: Tibor Vass <tibor@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/docker-ce that referenced this pull request Feb 3, 2020
Brings in this pull request:
- moby/libnetwork#1808

Signed-off-by: Tibor Vass <tibor@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Feb 3, 2020
Brings in this pull request:
- moby/libnetwork#1808

Signed-off-by: Tibor Vass <tibor@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 10, 2020
Brings in this pull request:
- moby/libnetwork#1808

Signed-off-by: Tibor Vass <tibor@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 23, 2020
Brings in this pull request:
- moby/libnetwork#1808

Signed-off-by: Tibor Vass <tibor@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants