Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #6504 to 6.2: Fix infinite failure on Kubernetes watch #6530

Merged
merged 2 commits into from
Mar 12, 2018

Conversation

exekias
Copy link
Contributor

@exekias exekias commented Mar 12, 2018

Cherry-pick of PR #6504 to 6.2 branch. Original message:

This PR fixes #6503

How to reproduce: Run filebeat pointing to minikube.

minikube ssh
sudo su

ps aux | grep localkube
kill -9 process_id

This will force a failure on the API server, and when the API server comes back up it will not be able to serve up the last resource version that we had requested with the failure:

type:"ERROR" object:<raw:"k8s\000\n\014\n\002v1\022\006Status\022C\n\004\n\000\022\000\022\007Failure\032)too old resource version: 310742 (310895)\"\004Gone0\232\003\032\000\"\000" >  typeMeta:<apiVersion:"v1" kind:"Status" > raw:"\n\004\n\000\022\000\022\007Failure\032)too old resource version: 310742 (310895)\"\004Gone0\232\003" contentEncoding:"" contentType:""  <nil>

In such scenarios the only mitigation would be to move the resource version to the latest. Scenarios like this would be addressed by client-go. The reason why the code fails with error is because we pass a Pod resource to do the watcher.Next() in this scenario the resource that is attempted to be parsed is an Error resource and the protobuf unmarshalling fails. This is a limitation in the client that we use as the resource needs to be passed explicitly.

This fix is not the best in the world as it might miss few state changes.

@exekias exekias force-pushed the backport_6504_6.2 branch from f779541 to 6d6d7c6 Compare March 12, 2018 10:11
@ruflin ruflin merged commit 38dc5c1 into elastic:6.2 Mar 12, 2018
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…atch (elastic#6530)

Cherry-pick of PR elastic#6504 to 6.2 branch. Original message: 

This PR fixes elastic#6503

How to reproduce: Run filebeat pointing to minikube. 

```
minikube ssh
sudo su

ps aux | grep localkube
kill -9 process_id
```

This will force a failure on the API server, and when the API server comes back up it will not be able to serve up the last resource version that we had requested with the failure:
```
type:"ERROR" object:<raw:"k8s\000\n\014\n\002v1\022\006Status\022C\n\004\n\000\022\000\022\007Failure\032)too old resource version: 310742 (310895)\"\004Gone0\232\003\032\000\"\000" >  typeMeta:<apiVersion:"v1" kind:"Status" > raw:"\n\004\n\000\022\000\022\007Failure\032)too old resource version: 310742 (310895)\"\004Gone0\232\003" contentEncoding:"" contentType:""  <nil>
```

In such scenarios the only mitigation would be to move the resource version to the latest. Scenarios like this would be addressed by `client-go`. The reason why the code fails with error is because we pass a `Pod` resource to do the `watcher.Next()` in this scenario the resource that is attempted to be parsed is an `Error` resource and the protobuf unmarshalling fails. This is a limitation in the client that we use as the resource needs to be passed explicitly. 

This fix is not the best in the world as it might miss few state changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants