Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add crash when failure in list or watch of kubernetes api server #1322

Closed
wants to merge 1 commit into from

Conversation

KnicKnic
Copy link

Description

Address "Flanneld doesn't reconnect to the apiserver" - #1272

Add crash when failure in list or watch when talking to kubernetes api server. This should result in a retry loop for the connection (due to flannel being restarted).

Ideally the retry loop should be fully encapsulated inside flannel, however I didn't have much time to fix this problem. And it has been a month and no one has offered a better solution. I hope that this PR sparks that.

Todos

- [ ] Tests
- [ ] Documentation

  • Release note

Release Note

Workaround for not reconnecting to api-server in windows

@rajatchopra
Copy link
Contributor

Valid enhancement, but we should probably use the informer factory from client-go tools.

@quickstar
Copy link

This fix is crucial! I'm struggling with this issue since I first created a windows based kubelet!

@rajatchopra
Copy link
Contributor

Not sure if we should Exit. Can we use the 'context' and cancel it instead?
So that an appropriate cleanup can happen.

@Oats87 PTAL

@Oats87
Copy link
Member

Oats87 commented Aug 25, 2020

@rajatchopra Looking at this, I think it is safe to merge this PR with the idea to perform a future enhancement to move to utilizing the nodeInformer factory, although interestingly in client-go itself there is not behavior similar to this: https://github.com/kubernetes/client-go/blob/master/informers/core/v1/node.go#L64

return ksm.client.CoreV1().Nodes().List(options)
obj, err := ksm.client.CoreV1().Nodes().List(options)
if err != nil {
glog.Exit(err, "failed to list nodes in newKubeSubnetManager")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why quit? This should recover on its own.

@rajatchopra
Copy link
Contributor

For #1272 we should improve the reconciliation loop. We should recover from the error with retries (exponentially backing ones if at all), rather than exiting i.e. establish a new connection to apiserver.

@KnicKnic Your opinion please?

Closing this PR, please re-open if anyone thinks otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants