-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service discovery hardening #1796
Conversation
SetMatrix is a simple matrix of sets. Added tests This data structure will be used in following commit to handle transient states where the same key can momentarely be associated to more than a value Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
49f2a16
to
bda77e3
Compare
endpoint.go
Outdated
@@ -515,6 +515,11 @@ func (ep *endpoint) sbJoin(sb *sandbox, options ...EndpointOption) (err error) { | |||
return err | |||
} | |||
|
|||
if err = ep.addServiceInfoToCluster(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move it back to EnableService (introduce back the sb lock)
sandbox.go
Outdated
@@ -669,24 +669,14 @@ func (sb *sandbox) SetKey(basePath string) error { | |||
|
|||
func (sb *sandbox) EnableService() error { | |||
for _, ep := range sb.getConnectedEndpoints() { | |||
if ep.enableService(true) { | |||
if err := ep.addServiceInfoToCluster(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore Enable service logic (introducing again the sb lock)
0a8bf0c
to
94372a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushing the initial set of review comments. I have a few more areas to cover.
// but in any case the deleteServiceInfoToCluster will follow doing the cleanup if needed. | ||
// In case the deleteServiceInfoToCluster arrives first, this one is happening after the endpoint is | ||
// removed from the list, in this situation the delete will bail out not finding any data to cleanup | ||
// and the add will bail out not finding the endpoint on the sandbox. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the excellent summary of the problem.
service.go
Outdated
@@ -46,19 +48,51 @@ type service struct { | |||
ingressPorts portConfigs | |||
|
|||
// Service aliases | |||
aliases []string | |||
serviceAliases []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its okay to be called aliases
since it belongs to the service
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will restore
network.go
Outdated
ipInfo.extResolver = true | ||
if ok, _ := sr.ipMap.Contains(ipStr, ipInfo{name: name, extResolver: true}); !ok { | ||
sr.ipMap.Remove(ipStr, ipInfo{name: name}) | ||
sr.ipMap.Insert(ipStr, ipInfo{name: name, extResolver: true}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fcrisciani I dont quite understand this logic. Can you pls explain ?
@sanimej could you PTAL this change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the original logic what the code does is to fetch the element from the map and set the extResolver to true.
The new logic simply checks if there is already an element with extResolver to true, if there is not then removes the entry that has extResolver to false and inserts one to true.
Looking better the logic is slightly different from the original, in the sense that I insert all the time an entry if it is not there, instead the original logic if the entry is not there does not change it. I will add the check to see if the entry is there with extResolver to false
network.go
Outdated
ipInfo, ok := elemSet[0].(ipInfo) | ||
if !ok { | ||
setStr, b := sr.ipMap.String(ip) | ||
logrus.Warnf("Something is wrong with the ipInfo set for key %s set:%t %s", ip, b, setStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a better warning message maybe ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion? It will be !ok if the element inserted is not of type ipInfo that is strange and major bug
network.go
Outdated
// network db notifications) | ||
// In such cases the resolution will be based on the first element of the set, and can vary | ||
// during the system stabilitation | ||
ipInfo, ok := elemSet[0].(ipInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the check for extResolver
intentionally ignored here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, will add that also
@@ -285,7 +285,6 @@ func (nDB *NetworkDB) CreateEntry(tname, nid, key string, value []byte) error { | |||
nDB.indexes[byNetwork].Insert(fmt.Sprintf("/%s/%s/%s", nid, tname, key), entry) | |||
nDB.Unlock() | |||
|
|||
nDB.broadcaster.Write(makeEvent(opCreate, tname, nid, key, value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pls explain why it is okay to remove the broadcast event ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is triggering a notification of the handleEpTableEvent for local endpoints. This is not necessary considering that local endpoints got already configured by logic before. Also that can be risky because now the timing with remote events can arrive out of sync due to the fact that remote events will take more time than local dispatched ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am not against this change. But I like to understand this a bit more.
broadcaster.Write
ultimately calls appropriate Watch routines such as handleEpTableEvent
or handleNodeTableEvent
or other tables managed by other components (such as overlay driver).
CreateEntry in networkdb is a generic API to create an event of any type. But changing this generic function for a specific case of Endpoint management seems risky without considering how it impacts other events.
What if there is another table (not endpoint), with an event created by one component and consumed not just by remote peers, but also by another local component which networkdb as the mechanism for communication ?
Also, if we remove this call, can you point me to another code path for which handleEpTableEvent is called for local endpoints ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the review on all the current usage of CreateEntry and Watch, there is no such case that will break and hence am fine with not having to broadcast this event from networkDB.
94372a6
to
41278c3
Compare
Don't see this message on github, so will reply by email.
The reasons are several:
1) Events are happening in parallel, imagine this scenario with 2 workers.
Worker A is allocating IPa, now that container exit so IPa is released.
Before that the notification from network db arrives to Worker B, a new
container on Worker B start using IPa. When the notification from WorkerA
arrives to Worker B without the SetMatrix the entry in the ipMap will be
removed. This will break the resolution for the container that is still
running on Worker B.
2) Events from network DB sometime are replayed for endpoints that got
deleted in the past. I saw events delayed up to 6 min. In this case if the
IP is reused by any other worker, the notification arriving will wipe out
the entry from the ipMap.
For all the previous reason, the SetMatrix will guarantee that a specific
IP is going to be removed when no other endpoint is associated with it.
…On Sat, Jun 10, 2017 at 1:14 PM, Madhu Venugopal ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In network.go
<#1796 (comment)>:
> @@ -97,7 +98,7 @@ type ipInfo struct {
type svcInfo struct {
svcMap map[string][]net.IP
svcIPv6Map map[string][]net.IP
- ipMap map[string]*ipInfo
+ ipMap common.SetMatrix
I dont quite understand the need to have this ipMap structure to container
more than 1 ipInfo ?
Hence I dont see a reason to change it to use SetMatrix.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1796 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABvTieULfqSMaiyLTPRzIJX7RirJOZrYks5sCvkxgaJpZM4N0i7C>
.
|
// In such cases the resolution will be based on the first element of the set, and can vary | ||
// during the system stabilitation | ||
elem, ok := elemSet[0].(ipInfo) | ||
if !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of adding this defensive check ?
ipMap
is completely controlled by libnetwork core and what is the purpose of checking for ipInfo
type ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the SetMatrix is a generic data structure that store potentially any kind of type. This, as far as I know, is the only way to cast back to the original type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. This is a defensive check and am okay to keep it in and am hoping to never see the Error message ever :)
network.go
Outdated
elem, ok := elemSet[0].(ipInfo) | ||
if !ok { | ||
setStr, b := sr.ipMap.String(ip) | ||
logrus.Warnf("BUG expected set of ipInfo type for key %s set:%t %s", ip, b, setStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning with a BUG seems odd. It certainly will raise an alarm with the users and as mentioned above, when do we expect such a Bug and if this is a concern, shouldnt this be resolved rather than logging it as a BUG ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has to raise the alarm, It will mean that in that map got inserted a wrong type.
@@ -285,7 +285,6 @@ func (nDB *NetworkDB) CreateEntry(tname, nid, key string, value []byte) error { | |||
nDB.indexes[byNetwork].Insert(fmt.Sprintf("/%s/%s/%s", nid, tname, key), entry) | |||
nDB.Unlock() | |||
|
|||
nDB.broadcaster.Write(makeEvent(opCreate, tname, nid, key, value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am not against this change. But I like to understand this a bit more.
broadcaster.Write
ultimately calls appropriate Watch routines such as handleEpTableEvent
or handleNodeTableEvent
or other tables managed by other components (such as overlay driver).
CreateEntry in networkdb is a generic API to create an event of any type. But changing this generic function for a specific case of Endpoint management seems risky without considering how it impacts other events.
What if there is another table (not endpoint), with an event created by one component and consumed not just by remote peers, but also by another local component which networkdb as the mechanism for communication ?
Also, if we remove this call, can you point me to another code path for which handleEpTableEvent is called for local endpoints ?
for _, ep := range sb.getConnectedEndpoints() { | ||
if ep.enableService(false) { | ||
if err := ep.deleteServiceInfoFromCluster(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this, Isnt it better to have deleteServiceInfoFromCluster
called from DisableService, but remove it from sbLeave
in order to be consistent with the way EnableService and DisableService are handled. Also it brings in consistency between Join and Leave. WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great as a future addition, but we have to investigate some corner cases. For example I validated that if you do the docker kill <container>
no DisableService is called. This of course will leave stale entries behind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
type loadBalancerBackend struct { | ||
ip net.IP | ||
containerName string | ||
taskAliases []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is weird to see taskAliases and containerName carried in loadBalancerBackend, where this structure is used so far for the VIP based IPVS programming. Are we changing this for other purposes now (such as DNS-RR as well) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why we have to carry these additional details. This is required especially so that cleanupServiceBindings
can call the lower level SD functions that expects to see containerName and taskAliases. But still, this seems out of place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fcrisciani Please consider picking up this change mavenugo@92820b9
This is a minor rework of your changes that enables us to avoid the need to change this structure and also helps scope this PR for fixing the race issues and minimize the code-reorg for a later activity if there is a need.
bbd63a7
to
f2da09f
Compare
agent.go
Outdated
n.addSvcRecords(name, ip, nil, true) | ||
for _, alias := range taskaliases { | ||
n.addSvcRecords(alias, ip, nil, true) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how you have removed this here and consolidated all the addSvcRecords under addEndpointNameResolution
.
But removing this will result in unmanaged containers in attachable
networks losing SD.
Hence, we have to call addEndpointNameResolution
from here and addEndpointNameResolution
must be changed to also handle unmanaged container scenario.
agent.go
Outdated
n.deleteSvcRecords(name, ip, nil, true) | ||
for _, alias := range taskaliases { | ||
n.deleteSvcRecords(alias, ip, nil, true) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
agent.go
Outdated
@@ -602,8 +622,7 @@ func (ep *endpoint) addServiceInfoToCluster() error { | |||
if n.ingress { | |||
ingressPorts = ep.ingressPorts | |||
} | |||
|
|||
if err := c.addServiceBinding(ep.svcName, ep.svcID, n.ID(), ep.ID(), ep.virtualIP, ingressPorts, ep.svcAliases, ep.Iface().Address().IP); err != nil { | |||
if err := c.addServiceBinding(ep.svcName, ep.svcID, n.ID(), ep.ID(), ep.Name(), ep.virtualIP, ingressPorts, ep.svcAliases, ep.myAliases, ep.Iface().Address().IP, "addServiceInfoToCluster"); err != nil { | |||
return err | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should also call addEndpointNameResolution
from here for the case of unmanaged container and its name resolution.
agent.go
Outdated
@@ -655,8 +680,7 @@ func (ep *endpoint) deleteServiceInfoFromCluster() error { | |||
if n.ingress { | |||
ingressPorts = ep.ingressPorts | |||
} | |||
|
|||
if err := c.rmServiceBinding(ep.svcName, ep.svcID, n.ID(), ep.ID(), ep.virtualIP, ingressPorts, ep.svcAliases, ep.Iface().Address().IP); err != nil { | |||
if err := c.rmServiceBinding(ep.svcName, ep.svcID, n.ID(), ep.ID(), ep.Name(), ep.virtualIP, ingressPorts, ep.svcAliases, ep.myAliases, ep.Iface().Address().IP, "deleteServiceInfoFromCluster"); err != nil { | |||
return err | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should also call deleteEndpointNameResolution from here for the case of unmanaged container and its name resolution.
d6a81a6
to
47842a4
Compare
} | ||
} | ||
|
||
if addService && len(vip) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its hard for me to judge if addService
variable is introduced here just as an optimization to avoid calling addSvcRecords
multiple times for the same svcName <-> vip combination ?
or Is there a real problem if we call addSvcRecords
multiple times for the same svcName <-> vip combination ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no real problem is only an optimization and also more symmetric towards the delSvcRecords that has the rmService (the case of the removal the logic is mandatory of course)
47842a4
to
1a67846
Compare
c := n.getController() | ||
agent := c.getAgent() | ||
|
||
name := ep.Name() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mavenugo I guess to handle properly the case of anonymous container this has to be brought up
if n.ingress { | ||
ingressPorts = ep.ingressPorts | ||
} | ||
name := ep.Name() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mavenugo same here for the delete this name is needed
1a67846
to
f6e2ee8
Compare
changed the ipMap to SetMatrix to allow transient states Compacted the addSvc and deleteSvc into a one single method Updated the datastructure for backends to allow storing all the information needed to cleanup properly during the cleanupServiceBindings Removed the enable/disable Service logic that was racing with sbLeave/sbJoin logic Add some debug logs to track further race conditions Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
f6e2ee8
to
d64e71e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fcrisciani Thanks for addressing all the comments and fixing the hard to reproduce race test-cases.
LGTM
Contains Service Discovery hardening fixes via moby/libnetwork#1796 Fixes multiple issues such as moby#32830 Signed-off-by: Madhu Venugopal <madhu@docker.com>
This is a cherry-pick of moby/moby#33634 that brings in moby/libnetwork#1796. Signed-off-by: Madhu Venugopal <madhu@docker.com>
Contains Service Discovery hardening fixes via moby/libnetwork#1796 Fixes multiple issues such as #32830 Signed-off-by: Madhu Venugopal <madhu@docker.com> Upstream-commit: 6868b8e Component: engine
This is a cherry-pick of moby/moby#33634 that brings in moby/libnetwork#1796. Signed-off-by: Madhu Venugopal <madhu@docker.com>
This is a cherry-pick of moby/moby#33634 that brings in moby/libnetwork#1796. Signed-off-by: Madhu Venugopal <madhu@docker.com>
This is a cherry-pick of moby/moby#33634 that brings in moby/libnetwork#1796. Signed-off-by: Madhu Venugopal <madhu@docker.com>
Contains Service Discovery hardening fixes via moby/libnetwork#1796 Fixes multiple issues such as #32830 Signed-off-by: Madhu Venugopal <madhu@docker.com> Upstream-commit: 6868b8e Component: engine
This is a cherry-pick of moby/moby#33634 that brings in moby/libnetwork#1796. Signed-off-by: Madhu Venugopal <madhu@docker.com>
This patch addresses several issues found on the Service Discovery feature.