Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hole in the CloudWatch logs #416

Closed
qdupuy opened this issue Sep 6, 2022 · 5 comments
Closed

Hole in the CloudWatch logs #416

qdupuy opened this issue Sep 6, 2022 · 5 comments

Comments

@qdupuy
Copy link

qdupuy commented Sep 6, 2022

Describe the question/issue

Since I implemented fluentbit on my EKS clusters, I am experiencing a small problem with sending logs to CloudWatch Logs.

Configuration

Application log file :

[INPUT]
    Name                tail
    Tag                 application.*
    Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
    Path                /var/log/containers/*.log
    Docker_Mode         On
    Docker_Mode_Flush   5
    Docker_Mode_Parser  container_firstline
    Parser              cri
    DB                  /var/fluent-bit/state/flb_container.db
    Mem_Buf_Limit       100MB
    Skip_Long_Lines     On
    Refresh_Interval    10
    Rotate_Wait         30
    storage.type        filesystem
    Read_from_Head      ${READ_FROM_HEAD}
    
[INPUT]
    Name                tail
    Tag                 application.*
    Path                /var/log/containers/fluent-bit*
    Parser              cri
    DB                  /var/fluent-bit/state/flb_log.db
    Mem_Buf_Limit       50MB
    Skip_Long_Lines     On
    Refresh_Interval    60
    Read_from_Head      ${READ_FROM_HEAD}

[INPUT]
    Name                tail
    Tag                 application.*
    Path                /var/log/containers/cloudwatch-agent*
    Docker_Mode         On
    Docker_Mode_Flush   5
    Docker_Mode_Parser  cwagent_firstline
    Parser              cri
    DB                  /var/fluent-bit/state/flb_cwagent.db
    Mem_Buf_Limit       50MB
    Skip_Long_Lines     On
    Refresh_Interval    20
    Read_from_Head      ${READ_FROM_HEAD}

[FILTER]
    Name                kubernetes
    Match               application.*
    Kube_URL            https://kubernetes.default.svc:443
    Kube_Tag_Prefix     application.var.log.containers.
    Merge_Log           On
    Merge_Log_Key       log_processed
    K8S-Logging.Parser  On
    K8S-Logging.Exclude On
    Labels              On
    Annotations         On

[OUTPUT]
    Name                cloudwatch_logs
    Match               application.*
    region              ${AWS_REGION}
    log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
    log_stream_prefix   ${HOST_NAME}-
    auto_create_group   true
    extra_user_agent    container-insights
    log_retention_days ${LOG_RETENTION_DAYS}
    Retry_Limit         5

dataplane log file :

[INPUT]
    Name                systemd
    Tag                 dataplane.systemd.*
    Systemd_Filter      _SYSTEMD_UNIT=docker.service
    Systemd_Filter      _SYSTEMD_UNIT=kubelet.service
    DB                  /var/fluent-bit/state/systemd.db
    Path                /var/log/journal
    Read_From_Tail      ${READ_FROM_TAIL}

[INPUT]
    Name                tail
    Tag                 dataplane.tail.*
    Path                /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
    Docker_Mode         On
    Docker_Mode_Flush   5
    Docker_Mode_Parser  container_firstline
    Parser              cri
    DB                  /var/fluent-bit/state/flb_dataplane_tail.db
    Mem_Buf_Limit       100MB
    Skip_Long_Lines     On
    Refresh_Interval    10
    Rotate_Wait         30
    storage.type        filesystem
    Read_from_Head      ${READ_FROM_HEAD}

[FILTER]
    Name                modify
    Match               dataplane.systemd.*
    Rename              _HOSTNAME                   hostname
    Rename              _SYSTEMD_UNIT               systemd_unit
    Rename              MESSAGE                     message
    Remove_regex        ^((?!hostname|systemd_unit|message).)*$

[FILTER]
    Name                aws
    Match               dataplane.*
    imds_version        v1
    
[OUTPUT]
    Name                cloudwatch_logs
    Match               dataplane.*
    region              ${AWS_REGION}
    log_group_name      /aws/containerinsights/${CLUSTER_NAME}/dataplane
    log_stream_prefix   ${HOST_NAME}-
    auto_create_group   true
    extra_user_agent    container-insights
    log_retention_days ${LOG_RETENTION_DAYS}
    Retry_Limit         5

fluentbit configuration :

[SERVICE]
    Flush                     5
    Log_Level                 info
    Daemon                    off
    Parsers_File              parsers.conf
    HTTP_Server               ${HTTP_SERVER}
    HTTP_Listen               0.0.0.0
    HTTP_Port                 ${HTTP_PORT}
    storage.path              /var/fluent-bit/state/flb-storage/
    storage.sync              normal
    storage.checksum          off
    storage.backlog.mem_limit 50M
    storage.metrics           on
    Health_Check              On
    HC_Errors_Count           5
    HC_Retry_Failure_Count    5
    HC_Period                 20

@INCLUDE application-log.conf
@INCLUDE dataplane-log.conf
@INCLUDE host-log.conf

host-log file :

[INPUT]
    Name                tail
    Tag                 host.dmesg
    Path                /var/log/dmesg
    Parser              syslog
    DB                  /var/fluent-bit/state/flb_dmesg.db
    Mem_Buf_Limit       50MB
    Skip_Long_Lines     On
    Refresh_Interval    20
    Read_from_Head      ${READ_FROM_HEAD}
    
[INPUT]
    Name                tail
    Tag                 host.messages
    Path                /var/log/messages
    Parser              syslog
    DB                  /var/fluent-bit/state/flb_messages.db
    Mem_Buf_Limit       50MB
    Skip_Long_Lines     On
    Refresh_Interval    20
    Read_from_Head      ${READ_FROM_HEAD}

[INPUT]
    Name                tail
    Tag                 host.secure
    Path                /var/log/secure
    Parser              syslog
    DB                  /var/fluent-bit/state/flb_secure.db
    Mem_Buf_Limit       50MB
    Skip_Long_Lines     On
    Refresh_Interval    20
    Read_from_Head      ${READ_FROM_HEAD}

[FILTER]
    Name                aws
    Match               host.*
    imds_version        v1

[OUTPUT]
    Name                cloudwatch_logs
    Match               host.*
    region              ${AWS_REGION}
    log_group_name      /aws/containerinsights/${CLUSTER_NAME}/host
    log_stream_prefix   ${HOST_NAME}.
    auto_create_group   true
    extra_user_agent    container-insights
    log_retention_days ${LOG_RETENTION_DAYS}
    Retry_Limit         5

parsers file :

[PARSER]
    Name                docker
    Format              json
    Time_Key            time
    Time_Format         %Y-%m-%dT%H:%M:%S.%L
    
[PARSER]
    Name                syslog
    Format              regex
    Regex               ^(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
    Time_Key            time
    Time_Format         %b %d %H:%M:%

[PARSER]
    Name                container_firstline
    Format              regex
    Regex               (?<log>(?<="log":")\S(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
    Time_Key            time
    Time_Format         %Y-%m-%dT%H:%M:%S.%L

[PARSER]
    Name                cwagent_firstline
    Format              regex
    Regex               (?<log>(?<="log":")\d{4}[\/-]\d{1,2}[\/-]\d{1,2}[ T]\d{2}:\d{2}:\d{2}(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
    Time_Key            time
    Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

[PARSER]
    Name                cri
    Format              regex
    Regex               ^(?<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{9}Z)\s(?<stream>stdout|stderr)\s(?<logtag>[A-Z])\s(?<message>.*)$
    Time_Key            time
    Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

Fluent Bit Log Output

Nothing about my problem and I even have a lack of logs on the fluentbit which corresponds to the cloudwatch hole without information

Fluent Bit Version Info

amazon/aws-for-fluent-bit:2.23.0

Cluster Details

EKS version : v1.22.10-eks-7dc61e8

Fluent-bit is deployed as DaemonSet

Application Details

I've several configuration files that send between 4000 and, 10000 logs/second in total

Steps to reproduce issue

Put the same image as me and the same configuration

Problem

In cloudwatch, I have quite large log holes but if I restart the DaemonSet, the logs are sent again

Here is a screenshot to support my statement :

CloudWatch Logs

@matthewfala
Copy link
Contributor

If possible, would you please set debug logging to true https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluent-bit/enable-debug-logging

and share your Fluent Bit debug logs? It seems like fluent bit is some how freezing up.

@qdupuy
Copy link
Author

qdupuy commented Sep 7, 2022

Hello,

I've already done it, but there are so many entries, how can I give you the right information?

@qdupuy
Copy link
Author

qdupuy commented Sep 9, 2022

New screenshot :

Capture d’écran 2022-09-09 à 12 28 33

the hole is always between 9am and 11am

@matthewfala
Copy link
Contributor

Have you resolved your problem? If not, we have seen some related issues come up recently and feel that this issue may be related to the CloudWatch Fluent Bit Hang issue resolved recently in aws-for-fluent-bit 2.31.2.

Please see the note about the cloudwatch hang issue here: #542
And here: #525

Please consider trying out aws-for-fluent-bit version 2.31.2

@qdupuy
Copy link
Author

qdupuy commented Feb 27, 2023

Yes, it's fixed, it was due to the application being badly developed for log output

@qdupuy qdupuy closed this as completed Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants