Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat: container logs not skipped if json decoding fails #30191

Closed
Poweranimal opened this issue Feb 3, 2022 · 7 comments
Closed

Filebeat: container logs not skipped if json decoding fails #30191

Poweranimal opened this issue Feb 3, 2022 · 7 comments
Labels

Comments

@Poweranimal
Copy link

Poweranimal commented Feb 3, 2022

Hi,

Apparently Filebeat doesn't skip logs collected from containers, if parsing the json log fails.

Instead, it retries to parse the log line many times a second infinitely and manages to spam a large amount of logs.

This blocks streaming logs to ElasticSearch entirely and it can cause huge logging costs, if one uses e.g. AWS CloudWatch or similar services to collect the logs of Filebeat,

Filebeat Version: 7.16.2

Filebeat Log
2022-02-03T16:21:25.592Z WARN [elasticsearch] elasticsearch/client.go:414
Cannot index event
publisher.Event{
    Content:beat.Event{
        Timestamp:time.Date(2022, time.February, 3, 16, 19, 4, 403066337, time.UTC),
        Meta:null,
        Fields:{
            "agent":{
                "ephemeral_id":"2a3cceae-5c78-4c39-89b4-19a4d20696cc",
                "hostname":"XXX",
                "id":"53ce1c8d-644b-41e4-a4b0-4a6bc085c6c3",
                "name":"XXX",
                "type":"filebeat",
                "version":"7.16.2"
            },
            "container":{
                "id":"c5f561919bb8c39625a11ffb70c85ea188e4f0315032b98728f51ea71b895d0d",
                "image":{
                    "name":"XXX"
                },
                "labels":{
                    "com_amazonaws_ecs_cluster":"XXX",
                    "com_amazonaws_ecs_container-name":"SessionGroupServiceContainer",
                    "com_amazonaws_ecs_task-arn":"XXX",
                    "com_amazonaws_ecs_task-definition-family":"session",
                    "com_amazonaws_ecs_task-definition-version":"425",
                    "metric":"prometheus"
                },
                "name":"ecs-session-425-SessionGroupServiceContainer-8ad280fbc7e0b2b8db01"
            },
            "ecs":{
                "version":"1.12.0"
            },
            "host":{
                "name":"XXX"
            },
            "input":{
                "type":"container"
            },
            "json":{
                "":"",
                "broker":"seed 0",
                "caller":"config/kafka.go:306",
                "level":"info",
                "msg":"KAFKA: [ERROR] unable to request api versions; broker: seed 0, err: invalid large response size 352518912 \u003e limit 104857600; the first three bytes recieved appear to be a tls alert record for TLS v1.2; is this a plaintext connection speaking to a tls endpoint?\n",
                "ts":""
            },
            "log":{
                "file":{
                    "path":"/var/lib/docker/containers/c5f561919bb8c39625a11ffb70c85ea188e4f0315032b98728f51ea71b895d0d/c5f561919bb8c39625a11ffb70c85ea188e4f0315032b98728f51ea71b895d0d-json.log"
                },
                "offset":2368461
            },
            "message":"{\"level\":\"info\",\"ts\":\"\",\"caller\":\"config/kafka.go:306\",\"msg\":\"KAFKA: [ERROR] unable to request api versions; broker: seed 0, err: invalid large response size 352518912 \u003e limit 104857600; the first three bytes recieved appear to be a tls alert record for TLS v1.2; is this a plaintext connection speaking to a tls endpoint?\\n\",\"broker\":\"seed 0\",\"\":\"\"}",
            "stream":"stderr"
        },
        Private:file.State{
            Id:"native::46221602-51713",
            PrevId:"",
            Finished:false,
            Fileinfo:(*os.fileStat)(0xc000950680),
            Source:"/var/lib/docker/containers/c5f561919bb8c39625a11ffb70c85ea188e4f0315032b98728f51ea71b895d0d/c5f561919bb8c39625a11ffb70c85ea188e4f0315032b98728f51ea71b895d0d-json.log",
            Offset:2368912,
            Timestamp:time.Date(2022, time.February, 3, 16, 21, 10, 801959876, time.Local),
            TTL:-1,
            Type:"container",
            Meta:map[string]string(nil),
            FileStateOS:file.StateOS{Inode:0x2c14922, Device:0xca01},
            IdentifierName:"native"
        },
        TimeSeries:false
    },
    Flags:0x1,
    Cache:publisher.EventCache{m:common.MapStr(nil)}
}
(status=400): {
    "type":"mapper_parsing_exception",
    "reason":"failed to parse",
    "caused_by":{
        "type":"illegal_argument_exception",
        "reason":"field name cannot be an empty string"
    }
}, dropping event!
Filebeat Config
logging.to_files: true
logging.to_syslog: false

setup:
  kibana:
    host: '${KIBANA_ADDR}'
    username: '${KIBANA_USERNAME}'
    password: '${KIBANA_PASSWORD}'
    ssl:
      enabled: true
      certificate: "/etc/filebeat/tls/kibana.monitoring/public.crt"
      key: "/etc/filebeat/tls/kibana.monitoring/private.key"
      certificate_authorities: ["/etc/filebeat/tls/kibana.monitoring/root.crt"]

  ilm:
    enabled: "auto"
    overwrite: true
    rollover_alias: "filebeat"
    pattern: "{now/d}-000001"
    policy_file: "/usr/share/filebeat/ilm_policy.json"

filebeat.inputs:
  - type: container
    paths:
      - '/var/lib/docker/containers/*/*-json.log'

output.elasticsearch:
  hosts: '${ELASTICSEARCH_HOSTS}'
  username: '${ELASTICSEARCH_USERNAME}'
  password: '${ELASTICSEARCH_PASSWORD}'
  ssl:
    certificate: "/etc/filebeat/tls/elasticsearch.monitoring/public.crt"
    key: "/etc/filebeat/tls/elasticsearch.monitoring/private.key"
    certificate_authorities: ["/etc/filebeat/tls/elasticsearch.monitoring/root.crt"]

processors:
  - add_docker_metadata:
      host: "unix:///var/run/docker.sock"
  - decode_json_fields:
      fields: "message"
      target: "json"
      max_depth: 10
      overwrite_keys: true
  - drop_event:
      when:
        regexp:
          container.name: "ecs-agent"
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 3, 2022
@tetianakravchenko tetianakravchenko added the Team:Integrations Label for the Integrations team label Feb 7, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 7, 2022
@aspacca aspacca self-assigned this Feb 7, 2022
@aspacca
Copy link

aspacca commented Feb 7, 2022

hi @Poweranimal ,

as far as I can see it is not an error in parsing the json log entry, but rather the fact that the json contains a field with an empty key (\"\":\"\", "json":{"":""), elasticsearch will then returns an error because the field name cannot be an empty string

as a workaround for your specific scenario you can add the following in the processors list:

  - drop_fields:
      fields: ["json.json.", "json.message."]
      ignore_missing: true

@aspacca aspacca added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team and removed Team:Integrations Label for the Integrations team labels Feb 7, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@aspacca aspacca removed their assignment Feb 7, 2022
@rdner rdner self-assigned this Feb 7, 2022
@Poweranimal
Copy link
Author

Poweranimal commented Feb 7, 2022

Hi @aspacca ,
Thanks a lot for your response.
Yes, the log is malformed. However, I think the outcome of this is quite questionable.

Things like this can happen and currently this will result in a situation in which Filebeat doesn’t ship any logs to ElasticSearch and creates tons of trash logs that can lead to high logging cost.

Something very similar has already happened to us in an other way in the past.
At that time, we were looking for logs at this path /var/lib/docker/containers/*/*.log. One day AWS ECS decided to add an additional log file to this location that was not in json format. This resulted in the very same issue and causing us additional logging cost by AWS CloudWatch.

I think the default behavior shouldn't be:
retrying the same log line many times a second indefinitely and producing tons of logs by doing so.

@tetianakravchenko
Copy link
Contributor

@Poweranimal could maybe json.ignore_decoding_error: true setting be helpful in your case - https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-container.html#filebeat-input-container-config-json ? Seems there was similar issue - #6045, as an outcome was introduced ignore_decoding_error.

@philippwiesner
Copy link

philippwiesner commented Jul 15, 2022

Hi, we are experiencing the same issue, but with JSON documents send via the TCP input and decoded via the decode_json_processor. As per documentation the added option of #6045 does not add the ignore_decoding_error directive to the processor.

It would be great if this option can also be added to the processor itself.

@botelastic
Copy link

botelastic bot commented Jul 15, 2023

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Jul 15, 2023
@botelastic botelastic bot closed this as completed Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants