add crash when failure in list or watch of kubernetes api server #1322

KnicKnic · 2020-07-21T23:22:56Z

Description

Address "Flanneld doesn't reconnect to the apiserver" - #1272

Add crash when failure in list or watch when talking to kubernetes api server. This should result in a retry loop for the connection (due to flannel being restarted).

Ideally the retry loop should be fully encapsulated inside flannel, however I didn't have much time to fix this problem. And it has been a month and no one has offered a better solution. I hope that this PR sparks that.

Todos

~~- [ ] Tests~~
~~- [ ] Documentation~~

Release note

Release Note

Workaround for not reconnecting to api-server in windows

rajatchopra · 2020-07-23T15:46:06Z

Valid enhancement, but we should probably use the informer factory from client-go tools.

quickstar · 2020-08-01T11:30:34Z

This fix is crucial! I'm struggling with this issue since I first created a windows based kubelet!

rajatchopra · 2020-08-01T23:17:43Z

Not sure if we should Exit. Can we use the 'context' and cancel it instead?
So that an appropriate cleanup can happen.

@Oats87 PTAL

Oats87 · 2020-08-25T23:53:58Z

@rajatchopra Looking at this, I think it is safe to merge this PR with the idea to perform a future enhancement to move to utilizing the nodeInformer factory, although interestingly in client-go itself there is not behavior similar to this: https://github.com/kubernetes/client-go/blob/master/informers/core/v1/node.go#L64

rajatchopra · 2020-09-17T15:44:27Z

subnet/kube/kube.go

-				return ksm.client.CoreV1().Nodes().List(options)
+				obj, err := ksm.client.CoreV1().Nodes().List(options)
+				if err != nil {
+					glog.Exit(err, "failed to list nodes in newKubeSubnetManager")


Why quit? This should recover on its own.

rajatchopra · 2020-09-17T15:52:05Z

For #1272 we should improve the reconciliation loop. We should recover from the error with retries (exponentially backing ones if at all), rather than exiting i.e. establish a new connection to apiserver.

@KnicKnic Your opinion please?

Closing this PR, please re-open if anyone thinks otherwise.

add crash when failure in list or watch

023f21b

rajatchopra assigned rajatchopra and unassigned rajatchopra Aug 20, 2020

rajatchopra reviewed Sep 17, 2020

View reviewed changes

rajatchopra closed this Sep 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add crash when failure in list or watch of kubernetes api server #1322

add crash when failure in list or watch of kubernetes api server #1322

KnicKnic commented Jul 21, 2020

rajatchopra commented Jul 23, 2020

quickstar commented Aug 1, 2020

rajatchopra commented Aug 1, 2020

Oats87 commented Aug 25, 2020

rajatchopra Sep 17, 2020

rajatchopra commented Sep 17, 2020

add crash when failure in list or watch of kubernetes api server #1322

add crash when failure in list or watch of kubernetes api server #1322

Conversation

KnicKnic commented Jul 21, 2020

Description

Todos

Release Note

rajatchopra commented Jul 23, 2020

quickstar commented Aug 1, 2020

rajatchopra commented Aug 1, 2020

Oats87 commented Aug 25, 2020

rajatchopra Sep 17, 2020

Choose a reason for hiding this comment

rajatchopra commented Sep 17, 2020