Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IP overlap with empty EndpointSpec #2505

Merged
merged 1 commit into from
Feb 8, 2018

Conversation

fcrisciani
Copy link

Passing and empty EndpointSpec in the service spec
was correctly triggering the VIP allocation but
the leader election was erroneusly handling the IPAM
state restore.
This fix ensure that the EndpointSpec if not specified is
actually added to the ServiceSpec selection endpoint mode VIP.
Also the allocate service has now the restart flag that will skip
the deallocation logic that was erroneously triggered.

Fix: moby/moby#33795

Signed-off-by: Flavio Crisciani flavio.crisciani@docker.com

@codecov
Copy link

codecov bot commented Feb 6, 2018

Codecov Report

Merging #2505 into master will increase coverage by 0.36%.
The diff coverage is 50%.

@@            Coverage Diff             @@
##           master    #2505      +/-   ##
==========================================
+ Coverage   61.32%   61.68%   +0.36%     
==========================================
  Files         131      129       -2     
  Lines       21451    21293     -158     
==========================================
- Hits        13154    13135      -19     
+ Misses       6878     6753     -125     
+ Partials     1419     1405      -14

@marcusmartins
Copy link

@@ -940,7 +940,7 @@ func updatePortsInHostPublishMode(s *api.Service) {
s.Endpoint.Spec = s.Spec.Endpoint.Copy()
}

func (a *Allocator) allocateService(ctx context.Context, s *api.Service) error {
func (a *Allocator) allocateService(ctx context.Context, s *api.Service, restart bool) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment for this function along with a description for each of the params.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -306,6 +306,7 @@ func validateTaskSpec(taskSpec api.TaskSpec) error {
func validateEndpointSpec(epSpec *api.EndpointSpec) error {
// Endpoint spec is optional
if epSpec == nil {
epSpec = &api.EndpointSpec{Mode: api.ResolutionModeVirtualIP}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest doing this outside this function. This function is for validation purposes only and should not have any side effects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly agree. We should not modify the user's spec after it's been sent to Swarmkit. There are Good Reasons to do this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it outside

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also probably return an InvalidArgument here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we don't need to set any default if we bail out with an error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change essentially is to make EndpointSpec mandatory, correct ? So, to avoid breaking old clients, we're populating it in swarm.

Its OK to return an error from validateEndpointSpec and handle it wherever it is called by filling in an empty EndpointSpec.

@@ -1188,7 +1190,7 @@ func (a *Allocator) procUnallocatedServices(ctx context.Context) {
var allocatedServices []*api.Service
for _, s := range nc.unallocatedServices {
if !nc.nwkAllocator.IsServiceAllocated(s) {
if err := a.allocateService(ctx, s); err != nil {
if err := a.allocateService(ctx, s, false); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a comment on why restart flag is false here.

@@ -587,8 +587,8 @@ func (a *Allocator) allocateServices(ctx context.Context, existingAddressesOnly
continue
}

if err := a.allocateService(ctx, s); err != nil {
log.G(ctx).WithError(err).Errorf("failed allocating service %s during init", s.ID)
if err := a.allocateService(ctx, s, existingAddressesOnly); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please add a comment on how existingAddressesOnly relates to the restart flag which is an arg for allocateService()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the name of the variable to match to show that the relation is that they are the same thing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to make this change ? I don't see it in the patch @fcrisciani

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of the variable is changed inside the allocateService not here, I kept the original name existingAddressesOnly

@@ -274,7 +274,7 @@ func (a *Allocator) doNetworkAlloc(ctx context.Context, ev events.Event) {
}
updatePortsInHostPublishMode(s)
} else {
if err := a.allocateService(ctx, s); err != nil {
if err := a.allocateService(ctx, s, false); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a comment on why restart flag is false here.

@@ -244,7 +244,7 @@ func (a *Allocator) doNetworkAlloc(ctx context.Context, ev events.Event) {
break
}

if err := a.allocateService(ctx, s); err != nil {
if err := a.allocateService(ctx, s, false); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a comment on why restart flag is false here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put a comment on the allocateService itself, instead to spread a one line everywhere saying that is not a restart case

@@ -244,6 +245,7 @@ vipLoop:
}
for _, nAttach := range specNetworks {
if nAttach.Target == eAttach.NetworkID {
log.L.WithFields(logrus.Fields{"service_id": s.ID, "vip": eAttach.Addr}).Infof("allocate vip")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: use Info instead fo Infof ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be Debug ? Info will print it for every ip ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking to want this printed. Concerns?

Copy link
Contributor

@nishanttotla nishanttotla Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fcrisciani only concern is that it might be too frequent to print in the logs. We normally would like to avoid that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely change from Info to Debug. It'll be too verbose for Info.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nishanttotla today we have 0 visibility and debug is very difficult, do you have any other idea?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guys putting it to debug, but we will continue to have no clue of what is going on and what is assigned especially when allocation are not correct

@@ -796,6 +796,145 @@ func TestAllocatorRestoreForDuplicateIPs(t *testing.T) {
}
}

func TestAllocatorRestartNoEndpointSpec(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a summery of this test case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@abhi abhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One nit

@@ -244,6 +245,7 @@ vipLoop:
}
for _, nAttach := range specNetworks {
if nAttach.Target == eAttach.NetworkID {
log.L.WithFields(logrus.Fields{"service_id": s.ID, "vip": eAttach.Addr}).Infof("allocate vip")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be Debug ? Info will print it for every ip ?

if err := a.allocateService(ctx, s); err != nil {
log.G(ctx).WithError(err).Errorf("failed allocating service %s during init", s.ID)
if err := a.allocateService(ctx, s, existingAddressesOnly); err != nil {
log.G(ctx).WithField("existingAddressesOnly", existingAddressesOnly).WithError(err).Errorf("failed allocating service %s during init", s.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can this comment be made more clear? Should be it something like "failed to allocate network..." or something of that nature, perhaps with more info?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error field will give that information.

@fcrisciani fcrisciani force-pushed the fixendpointspec branch 2 times, most recently from 9be2619 to 03eda38 Compare February 7, 2018 22:12
@dperny
Copy link
Collaborator

dperny commented Feb 7, 2018

LGTM

Copy link
Contributor

@anshulpundir anshulpundir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Minor nits. I'll merge as soon as you address these.

@@ -796,6 +796,149 @@ func TestAllocatorRestoreForDuplicateIPs(t *testing.T) {
}
}

// TestAllocatorRestartNoEndpointSpec covers the leader election case when the service Spec
// does not contains the EndpointSpec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: does not contains => does not contain

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -796,6 +796,149 @@ func TestAllocatorRestoreForDuplicateIPs(t *testing.T) {
}
}

// TestAllocatorRestartNoEndpointSpec covers the leader election case when the service Spec
// does not contains the EndpointSpec.
// The expected behavior iis that the VIP(s) are still correctly populated inside
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iis => is

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -587,8 +587,8 @@ func (a *Allocator) allocateServices(ctx context.Context, existingAddressesOnly
continue
}

if err := a.allocateService(ctx, s); err != nil {
log.G(ctx).WithError(err).Errorf("failed allocating service %s during init", s.ID)
if err := a.allocateService(ctx, s, existingAddressesOnly); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to make this change ? I don't see it in the patch @fcrisciani

@@ -940,7 +940,10 @@ func updatePortsInHostPublishMode(s *api.Service) {
s.Endpoint.Spec = s.Spec.Endpoint.Copy()
}

func (a *Allocator) allocateService(ctx context.Context, s *api.Service) error {
// allocateService takes care to align the desired state with the spec passed
// the last parameter is true only during restart when the data are read from raft
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: data are => data is

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Passing and empty EndpointSpec in the service spec
was correctly triggering the VIP allocation but
the leader election was erroneusly handling the IPAM
state restore trying to release the VIP.
The fix focuses on proper handling of the restart case.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

swarm mode duplicate ip addresses
6 participants