-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[autodiscover] Error creating runner from config: Can only start an input when all related states are finished #11834
Comments
+1 Similar issue reported in discuss: |
+1
|
Looked into this a bit more, and I'm guessing it has something to do with how events are emitted from kubernetes and how kubernetes provider in beats is handling them. In kubernetes, you usually get multiple (3 or more) UPDATE events from the time the pod was created until it became ready. Sometimes you even get multiple updates within a second. On the filebeat side, it translates a single update event into a STOP and a START, which will first try to stop the config and immediately create and apply a new config (https://github.com/elastic/beats/blob/6.7/libbeat/autodiscover/providers/kubernetes/kubernetes.go#L117-L118), and this is where I think things could go wrong. if the processing of events is asynchronous, then it is likely to run into race conditions, having 2 conflicting states of the same file in the registry. Either debounce the event stream or implement real update event instead of simulating with stop-start should help. |
I am having this same issue in my pod logs running in the daemonset. Running version |
Also running into this with 6.7.0. Frequent logs with
|
It seems like we're hitting this problem as well in our kubernetes cluster. Logs seem to go missing. |
Btw, we're running 7.1.1 and the issue is still present. Restart seems to solve the problem so we hacked in a solution where filebeat's liveness probe monitors it's own logs for the |
+1 |
Thank you everyone for your feedback! I was able to reproduce this, currently trying to get it fixed |
I can see it happening in 7.0.1 |
Still exist in 7.2. |
@exekias I spend some times digging on this issue and there are multiple causes leading to this "problem". I see this error message every time pod is stopped (not removed; when running cronjob). I run filebeat from master branch. Firstly, for good understanding, what this error message means, and what are its consequences: To get rid of the error message I see few possibilities: Option A: Make kubernetes provider aware of all events it has send to autodiscover event bus and skip sending events on "kubernetes pod update" when nothing important changes. Error can still appear in logs, but should be less frequent. Option B: Make atomic, synchronized operation for reload Input which will require to:
All this changes may have significant impact on performance of normal filebeat operations. I just tried this approached and realized I may have gone to far. Option C: Make API for Input reconfiguration "on the fly" and send "reload" event from kubernetes provider on each pod update event. It should still fallback to stop/start strategy when reload is not possible (eg. changed input type). This will probably affect all existing Input implementations. Option D: Change log level for this from Error to Warn and pretend that everything is fine ;) |
I'm not able to reproduce this one. I'd appreciate someone here providing some info on what operational pattern do I need to follow. |
@odacremolbap You can try generating lots of pod update event. starting pods with multiple containers, with readiness/liveness checks. eventually perform some manual actions on pods (eg. patch condition statuses, as readiness gates do). Or try running some short running pods (eg. cronjob that prints something to stdout and exits). I see it quite often in my kube cluster. Below example is for cronjob working as described above.
|
thanks @marqc tried the cronjobs, and patching pods ... no success so far. |
@odacremolbap What version of Kubernetes are you running? Seeing the issue here on 1.12.7 |
Seeing the issue in docker.elastic.co/beats/filebeat:7.1.1 |
Still exist in 7.2. :( |
I am running into the same issue with filebeat 7.2 & 7.3 running as a stand alone container on a swarm host. |
@jsoriano thank you for you help.
but this does not seem to be a valid config... |
@yogeek good catch, my configuration used
Instead of:
When you start having complex conditions it is a signal that you might benefit of using hints-based autodiscover. Among other things, it allows to define different configurations (or disable them) per namespace in the namespace annotations. If you continue having problems with this configuration, please start a new topic in https://discuss.elastic.co/ so we don't mix the conversation with the problem in this issue 🙂 |
thank you @jsoriano ! Seems to work without error now 👍 |
All my stack is in 7.9.0 using the elastic operator for k8s and the error messages still exist. But the logs seem not to be lost. So does this mean we should just ignore this |
Yes, in principle you can ignore this error. There is an open issue to improve logging in this case and discard unneeded error messages: #20568 |
@jsoriano I have a weird issue related to that error. Randomly Filebeat stop collecting logs from pods after print I'm running Filebeat 7.9.0. I've upgraded to the latest version once that behavior exists since 7.6.1 (the first time I've seen it). My enviroment:
|
This problem should be solved in 7.9.0, I am closing this. Some errors are still being logged when they shouldn't, we have created the following issues as follow ups:
|
@jsoriano and @ChrsMark I'm still not seeing filebeat 7.9.3 ship any logs from my k8s clusters. I do see logs coming from my filebeat 7.9.3 docker collectors on other servers. All the filebeats are sending logs to a elastic 7.9.3 server. I'm using the recommended filebeat configuration above from @ChrsMark. I also deployed the test logging pod. Filebeat seems to be finding the container/pod logs but I get a strange error (2020-10-27T13:02:09.145Z DEBUG [autodiscover] template/config.go:156 Configuration template cannot be resolved: field 'data.kubernetes.container.id' not available in event or environment accessing 'paths' (source:'/etc/filebeat.yml'):
Configuration yaml:
|
@sgreszcz I cannot reproduce it locally. Also you are adding Here is the manifest I'm using: Can you try with the above one and share your result? |
@ChrsMark thank you so much for sharing your manifest! I'm still not sure what exactly is the diff between yours and the one that I had build from the filebeat github example and the examples above in this issue. It was driving me crazy for a few days, so I really appreciate this and I can confirm if you just apply this manifest as-is and only change the elasticsearch hostname, all will work. |
weird, the only differences I can see in the new manifest is the addition of volume and volumemount (/var/lib/docker/containers) - but we are not even referring to it in the filebeat.yaml configmap.
The only config that was removed in the new manifest was this, so maybe these things were breaking the proper k8s log discovery:
|
If you are using docker as container engine, then /var/log/containers and /var/log/pods only contains symlinks to logs stored in /var/lib/docker so it has to be mounted to your filebeat container as well |
the same issue with the docker |
Hello, I was getting the same error on a Filebeat 7.9.3, with the following config:
I thought it was something with Filebeat. When I was testing stuff I changed my config to:
And the error changed to:
So I think the problem was the Elasticsearch resources and not the Filebeat config. |
@jsoriano Using Filebeat 7.9.3, I am still loosing logs with the following CronJob apiVersion: batch/v1beta1
kind: CronJob
metadata:
labels:
app: test-log
app.kubernetes.io/name: test-log
app.kubernetes.io/version: "1.0"
name: test-log
namespace: default
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
metadata:
labels:
app: test-log
app.kubernetes.io/instance: test-log
app.kubernetes.io/name: test-log
spec:
containers:
- command:
- /bin/sh
- -c
- |
echo '{ "Date": "2020-11-19 14:42:23", "Level": "Info", "Message": "Test LOG" }' > dev/stdout;
image: alpine:latest
imagePullPolicy: IfNotPresent
name: test-log
restartPolicy: OnFailure
schedule: '*/1, * * * *'
startingDeadlineSeconds: 100
successfulJobsHistoryLimit: 3
suspend: false A workaround for me is to change the container's command to delay the exit : - command:
- /bin/sh
- -c
- |
echo '{ "Date": "2020-11-19 14:42:23", "Level": "Info", "Message": "Test LOG" }' > dev/stdout;
++ sleep 10; |
@MrLuje what is your filebeat configuration? Autodiscover providers have a |
filebeatConfig:
filebeat.yml: |
prospectors:
# Mounted `filebeat-prospectors` configmap:
path: $${path.config}/prospectors.d/*.yml
# Reload prospectors configs as they change:
reload.enabled: false
modules:
path: $${path.config}/modules.d/*.yml
# Reload module configs as they change:
reload.enabled: false
fields:
tag: ${from}
filebeat.modules:
- module: nginx
filebeat.autodiscover:
providers:
- type: kubernetes
templates:
- condition.and:
- not.equals:
kubernetes.labels.stack: "dotnet"
- not.equals:
kubernetes.labels.stack: "js"
config:
- type: container
paths:
- /var/lib/docker/containers/$${data.kubernetes.container.id}/*-json.log
labels.dedot: true
annotations.dedot: true
in_cluster: true
include_annotations: ["*"]
hints.enabled: true
fields:
filebeat_config: default
fields_under_root: true
processors:
- add_cloud_metadata:
providers: ["gcp"]
- add_locale: ~
- drop_fields:
fields: ["agent.ephemeral_id", "agent.hostname", "agent.id", "agent.type", "agent.version", "agent.name", "ecs.version", "input.type", "log.offset", "stream"]
- drop_event:
when:
contains:
kubernetes.pod.name: 'oauth2-proxy'
output.logstash:
timeout: 120
hosts: ["${hosts}"] Not totally sure about the logs, the container id for one of the missing log is f9b726a9140eb60bdcc0a22a450a83999c76589785c7da5430e4536da4ccc502
|
I am going to lock this issue as it is starting to be a single point to report different issues with filebeat and autodiscover. If you find some problem with Filebeat and Autodiscover, please open a new topic in https://discuss.elastic.co/, and if a new problem is confirmed then open a new issue in github. |
Hi,
I am using filebeat 6.6.2 version with autodiscover for kubernetes provider type. After version upgrade from 6.2.4 to 6.6.2, I am facing this error for multiple docker containers.
ERROR [autodiscover] cfgfile/list.go:96 Error creating runner from config: Can only start an input when all related states are finished: {Id:3841919-66305 Finished:false Fileinfo:0xc42070c750 Source:/var/lib/docker/containers/a5330346622f0f10b4d85bac140b4bf69f3ead398a69ac0a66c1e3b742210393/a5330346622f0f10b4d85bac140b4bf69f3ead398a69ac0a66c1e3b742210393-json.log Offset:2860573 Timestamp:2019-04-15 19:28:25.567596091 +0000 UTC m=+557430.342740825 TTL:-1ns Type:docker Meta:map[] FileStateOS:3841919-66305}
And I see two entries in the registry file
{"source":"/var/lib/docker/containers/a1824700c0568c120cd3b939c85ab75df696602f9741a215c74e3ce6b497e111/a1824700c0568c120cd3b939c85ab75df696602f9741a215c74e3ce6b497e111-json.log","offset":8655848,"timestamp":"2019-04-16T10:33:16.507862449Z","ttl":-1,"type":"docker","meta":null,"FileStateOS":{"inode":3841895,"device":66305}} {"source":"/var/lib/docker/containers/a1824700c0568c120cd3b939c85ab75df696602f9741a215c74e3ce6b497e111/a1824700c0568c120cd3b939c85ab75df696602f9741a215c74e3ce6b497e111-json.log","offset":3423960,"timestamp":"2019-04-16T10:37:01.366386839Z","ttl":-1,"type":"docker","meta":null,"FileStateOS":{"inode":3841901,"device":66305}}]
Don't see any solutions other than setting the Finished flag to true or updating registry file. Any permanent solutions? Thanks in advance
The text was updated successfully, but these errors were encountered: