-
Notifications
You must be signed in to change notification settings - Fork 673
Add configurable reconciliation loop for pods, namespaces, and networ… #3772
Conversation
Thanks for the PR! Some unit tests failed; please take a look at https://circleci.com/gh/weaveworks/weave/13224 |
This is now resolved. The resourceversion of objects wasn't being emulated in tests. |
is this normal? Do I need to do something to get the smoke tests to run? |
The smoke-tests run with credentials on Google Cloud (accessed via the "secret key"); we don't allow PRs from other repos to see those credentials. I've pushed your branch to the main repo so it will run the smoke-tests. |
I would like to clarify what this PR achieves. Take the "update pod" event: if a message comes in that the pod data is updated, but the data is the same as the previous version, then weave-npc will do nothing. So this PR will help in the case that the Kubernetes data in memory is out of sync with the api-server, but does not help in the case that the iptables rules or ipsets are out of sync with the data. Is that your understanding? Is there any evidence that #3764 is caused by the first kind of mismatch rather than the second? |
Yes, that is a good point, it certainly wouldn't appear to have an effect if the ipsets were out of sync on the host itself during a reconciliation, as it's not doing a comparison. But it would catch one of the 2 out of sync scenarios you reference. Would you prefer that this PR includes data comparison during reconciliation? That seems like it would make a lot of sense. |
I think these two things are orthogonal and I always prefer to do unrelated changes in separate PRs. I would not call this one "reconcilliation"; Kubernetes calls it "resync". |
I will also get back to this in next Sprint next week. Sorry for delays. |
closing this as #3792 solves the actual, underlying problem of weave-npc application continuing to run when goroutine panics. |
re #3764
This adds a configurable reconciliation loop option, which defaults for the existing behavior of 0s (no reconciliation loop), for pods, namespaces, and network policies. We have been testing these changes in 7 internal non-production clusters with a defined reconciliation interval of > 0s for the past week with no noticeable negative impacts , and will be rolling out to our production clusters in the upcoming week. We also have not had a single reoccurrence of the issue in issues 3764 since this release.