-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix IP overlap with empty EndpointSpec #2505
Conversation
060844d
to
687a9ae
Compare
Codecov Report
@@ Coverage Diff @@
## master #2505 +/- ##
==========================================
+ Coverage 61.32% 61.68% +0.36%
==========================================
Files 131 129 -2
Lines 21451 21293 -158
==========================================
- Hits 13154 13135 -19
+ Misses 6878 6753 -125
+ Partials 1419 1405 -14 |
687a9ae
to
944e9a6
Compare
manager/allocator/network.go
Outdated
@@ -940,7 +940,7 @@ func updatePortsInHostPublishMode(s *api.Service) { | |||
s.Endpoint.Spec = s.Spec.Endpoint.Copy() | |||
} | |||
|
|||
func (a *Allocator) allocateService(ctx context.Context, s *api.Service) error { | |||
func (a *Allocator) allocateService(ctx context.Context, s *api.Service, restart bool) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment for this function along with a description for each of the params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
manager/controlapi/service.go
Outdated
@@ -306,6 +306,7 @@ func validateTaskSpec(taskSpec api.TaskSpec) error { | |||
func validateEndpointSpec(epSpec *api.EndpointSpec) error { | |||
// Endpoint spec is optional | |||
if epSpec == nil { | |||
epSpec = &api.EndpointSpec{Mode: api.ResolutionModeVirtualIP} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest doing this outside this function. This function is for validation purposes only and should not have any side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly agree. We should not modify the user's spec after it's been sent to Swarmkit. There are Good Reasons to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved it outside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also probably return an InvalidArgument here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we don't need to set any default if we bail out with an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change essentially is to make EndpointSpec mandatory, correct ? So, to avoid breaking old clients, we're populating it in swarm.
Its OK to return an error from validateEndpointSpec and handle it wherever it is called by filling in an empty EndpointSpec.
@@ -1188,7 +1190,7 @@ func (a *Allocator) procUnallocatedServices(ctx context.Context) { | |||
var allocatedServices []*api.Service | |||
for _, s := range nc.unallocatedServices { | |||
if !nc.nwkAllocator.IsServiceAllocated(s) { | |||
if err := a.allocateService(ctx, s); err != nil { | |||
if err := a.allocateService(ctx, s, false); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a comment on why restart flag is false here.
@@ -587,8 +587,8 @@ func (a *Allocator) allocateServices(ctx context.Context, existingAddressesOnly | |||
continue | |||
} | |||
|
|||
if err := a.allocateService(ctx, s); err != nil { | |||
log.G(ctx).WithError(err).Errorf("failed allocating service %s during init", s.ID) | |||
if err := a.allocateService(ctx, s, existingAddressesOnly); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please add a comment on how existingAddressesOnly relates to the restart flag which is an arg for allocateService()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the name of the variable to match to show that the relation is that they are the same thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to make this change ? I don't see it in the patch @fcrisciani
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name of the variable is changed inside the allocateService not here, I kept the original name existingAddressesOnly
@@ -274,7 +274,7 @@ func (a *Allocator) doNetworkAlloc(ctx context.Context, ev events.Event) { | |||
} | |||
updatePortsInHostPublishMode(s) | |||
} else { | |||
if err := a.allocateService(ctx, s); err != nil { | |||
if err := a.allocateService(ctx, s, false); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a comment on why restart flag is false here.
@@ -244,7 +244,7 @@ func (a *Allocator) doNetworkAlloc(ctx context.Context, ev events.Event) { | |||
break | |||
} | |||
|
|||
if err := a.allocateService(ctx, s); err != nil { | |||
if err := a.allocateService(ctx, s, false); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a comment on why restart flag is false here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put a comment on the allocateService itself, instead to spread a one line everywhere saying that is not a restart case
@@ -244,6 +245,7 @@ vipLoop: | |||
} | |||
for _, nAttach := range specNetworks { | |||
if nAttach.Target == eAttach.NetworkID { | |||
log.L.WithFields(logrus.Fields{"service_id": s.ID, "vip": eAttach.Addr}).Infof("allocate vip") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: use Info instead fo Infof ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be Debug ? Info will print it for every ip ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking to want this printed. Concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fcrisciani only concern is that it might be too frequent to print in the logs. We normally would like to avoid that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely change from Info
to Debug
. It'll be too verbose for Info
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nishanttotla today we have 0 visibility and debug is very difficult, do you have any other idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guys putting it to debug, but we will continue to have no clue of what is going on and what is assigned especially when allocation are not correct
manager/allocator/allocator_test.go
Outdated
@@ -796,6 +796,145 @@ func TestAllocatorRestoreForDuplicateIPs(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestAllocatorRestartNoEndpointSpec(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a summery of this test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One nit
@@ -244,6 +245,7 @@ vipLoop: | |||
} | |||
for _, nAttach := range specNetworks { | |||
if nAttach.Target == eAttach.NetworkID { | |||
log.L.WithFields(logrus.Fields{"service_id": s.ID, "vip": eAttach.Addr}).Infof("allocate vip") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be Debug ? Info will print it for every ip ?
if err := a.allocateService(ctx, s); err != nil { | ||
log.G(ctx).WithError(err).Errorf("failed allocating service %s during init", s.ID) | ||
if err := a.allocateService(ctx, s, existingAddressesOnly); err != nil { | ||
log.G(ctx).WithField("existingAddressesOnly", existingAddressesOnly).WithError(err).Errorf("failed allocating service %s during init", s.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can this comment be made more clear? Should be it something like "failed to allocate network..." or something of that nature, perhaps with more info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error field will give that information.
9be2619
to
03eda38
Compare
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Minor nits. I'll merge as soon as you address these.
manager/allocator/allocator_test.go
Outdated
@@ -796,6 +796,149 @@ func TestAllocatorRestoreForDuplicateIPs(t *testing.T) { | |||
} | |||
} | |||
|
|||
// TestAllocatorRestartNoEndpointSpec covers the leader election case when the service Spec | |||
// does not contains the EndpointSpec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does not contains => does not contain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
manager/allocator/allocator_test.go
Outdated
@@ -796,6 +796,149 @@ func TestAllocatorRestoreForDuplicateIPs(t *testing.T) { | |||
} | |||
} | |||
|
|||
// TestAllocatorRestartNoEndpointSpec covers the leader election case when the service Spec | |||
// does not contains the EndpointSpec. | |||
// The expected behavior iis that the VIP(s) are still correctly populated inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iis => is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -587,8 +587,8 @@ func (a *Allocator) allocateServices(ctx context.Context, existingAddressesOnly | |||
continue | |||
} | |||
|
|||
if err := a.allocateService(ctx, s); err != nil { | |||
log.G(ctx).WithError(err).Errorf("failed allocating service %s during init", s.ID) | |||
if err := a.allocateService(ctx, s, existingAddressesOnly); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to make this change ? I don't see it in the patch @fcrisciani
manager/allocator/network.go
Outdated
@@ -940,7 +940,10 @@ func updatePortsInHostPublishMode(s *api.Service) { | |||
s.Endpoint.Spec = s.Spec.Endpoint.Copy() | |||
} | |||
|
|||
func (a *Allocator) allocateService(ctx context.Context, s *api.Service) error { | |||
// allocateService takes care to align the desired state with the spec passed | |||
// the last parameter is true only during restart when the data are read from raft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: data are => data is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Passing and empty EndpointSpec in the service spec was correctly triggering the VIP allocation but the leader election was erroneusly handling the IPAM state restore trying to release the VIP. The fix focuses on proper handling of the restart case. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
03eda38
to
bd4e923
Compare
Passing and empty EndpointSpec in the service spec
was correctly triggering the VIP allocation but
the leader election was erroneusly handling the IPAM
state restore.
This fix ensure that the EndpointSpec if not specified is
actually added to the ServiceSpec selection endpoint mode VIP.
Also the allocate service has now the restart flag that will skip
the deallocation logic that was erroneously triggered.
Fix: moby/moby#33795
Signed-off-by: Flavio Crisciani flavio.crisciani@docker.com