Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Add configurable reconciliation loop for pods, namespaces, and networ… #3772

Closed

Conversation

naemono
Copy link
Contributor

@naemono naemono commented Feb 13, 2020

re #3764

This adds a configurable reconciliation loop option, which defaults for the existing behavior of 0s (no reconciliation loop), for pods, namespaces, and network policies. We have been testing these changes in 7 internal non-production clusters with a defined reconciliation interval of > 0s for the past week with no noticeable negative impacts , and will be rolling out to our production clusters in the upcoming week. We also have not had a single reoccurrence of the issue in issues 3764 since this release.

@bboreham
Copy link
Contributor

Thanks for the PR! Some unit tests failed; please take a look at https://circleci.com/gh/weaveworks/weave/13224

@naemono
Copy link
Contributor Author

naemono commented Feb 17, 2020

Thanks for the PR! Some unit tests failed; please take a look at https://circleci.com/gh/weaveworks/weave/13224

This is now resolved. The resourceversion of objects wasn't being emulated in tests.

@naemono
Copy link
Contributor Author

naemono commented Feb 17, 2020

#!/bin/bash -eo pipefail
bin/provision_test_vms.sh
Cannot run smoke tests: no secret key

Exited with code exit status 1

is this normal? Do I need to do something to get the smoke tests to run?

@bboreham
Copy link
Contributor

The smoke-tests run with credentials on Google Cloud (accessed via the "secret key"); we don't allow PRs from other repos to see those credentials. I've pushed your branch to the main repo so it will run the smoke-tests.

@bboreham
Copy link
Contributor

bboreham commented Feb 26, 2020

I would like to clarify what this PR achieves. Take the "update pod" event: if a message comes in that the pod data is updated, but the data is the same as the previous version, then weave-npc will do nothing.

So this PR will help in the case that the Kubernetes data in memory is out of sync with the api-server, but does not help in the case that the iptables rules or ipsets are out of sync with the data.

Is that your understanding? Is there any evidence that #3764 is caused by the first kind of mismatch rather than the second?

@naemono
Copy link
Contributor Author

naemono commented Feb 26, 2020

Yes, that is a good point, it certainly wouldn't appear to have an effect if the ipsets were out of sync on the host itself during a reconciliation, as it's not doing a comparison. But it would catch one of the 2 out of sync scenarios you reference. Would you prefer that this PR includes data comparison during reconciliation? That seems like it would make a lot of sense.

@bboreham
Copy link
Contributor

I think these two things are orthogonal and I always prefer to do unrelated changes in separate PRs.
Adding a periodic reconcilliation from the state of Kubernetes objects to iptables is a good idea, as discussed at #3764.

I would not call this one "reconcilliation"; Kubernetes calls it "resync".

@bboreham
Copy link
Contributor

@naemono are you likely to return to this?
Note some work at #3802 for recreating iptables rules when they get wiped out.

@naemono
Copy link
Contributor Author

naemono commented May 27, 2020

I will also get back to this in next Sprint next week. Sorry for delays.

@naemono
Copy link
Contributor Author

naemono commented Jun 3, 2020

closing this as #3792 solves the actual, underlying problem of weave-npc application continuing to run when goroutine panics.

@naemono naemono closed this Jun 3, 2020
@bboreham bboreham added this to the n/a milestone Jul 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants