Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tls] error: error:00000005:lib(0):func(0):DH lib #5705

Closed
DrewZhang13 opened this issue Jul 11, 2022 · 6 comments
Closed

[tls] error: error:00000005:lib(0):func(0):DH lib #5705

DrewZhang13 opened this issue Jul 11, 2022 · 6 comments

Comments

@DrewZhang13
Copy link
Contributor

DrewZhang13 commented Jul 11, 2022

Bug Report

Describe the bug

When running load testing up to 500TPS for fluent bit v1.9.3, after 30 mins, no logs come to CloudWatch log group. The mainly errors are

[tls] error: error:00000005:lib(0):func(0):DH lib

To Reproduce

  • Rubular link if applicable:
  • Example log message if applicable:
* �[1m�[93mCopyright (C) 2015-2022 The Fluent Bit Authors�[0m
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2022/07/07 12:02:12] [ info] [fluent bit] version=1.9.3, commit=eb4e2e770f, pid=1
[2022/07/07 12:02:12] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/07/07 12:02:12] [ info] [cmetrics] version=0.3.1
[2022/07/07 12:02:12] [ info] [input:tcp:tcp.0] listening on 127.0.0.1:8877
[2022/07/07 12:02:12] [ info] [input:forward:forward.1] listening on unix:///var/run/fluent.sock
[2022/07/07 12:02:12] [ info] [input:forward:forward.2] listening on 127.0.0.1:24224
[2022/07/07 12:02:12] [ info] [input:tcp:ApplicationLogs.tcp] listening on 0.0.0.0:5170
[2022/07/07 12:02:12] [ info] [input:tcp:RequestLogs.tcp] listening on 0.0.0.0:5171
[2022/07/07 12:02:12] [ info] [input:tcp:GlobLog.tcp] listening on 0.0.0.0:5172
[2022/07/07 12:02:12] [error] [input:tail:ServiceMetrics.tail] read error, check permissions: /apollo/env/IhmPrimsDecouplerService/var/output/logs/service_log*
[2022/07/07 12:02:12] [ warn] [input:tail:ServiceMetrics.tail] error scanning path: /apollo/env/IhmPrimsDecouplerService/var/output/logs/service_log*
[2022/07/07 12:02:12] [ info] [output:cloudwatch_logs:glob.log.cloudwatch_logs] worker #0 started
[2022/07/07 12:02:12] [ info] [output:kinesis_streams:glob.log.kinesis_streams] worker #0 started
[2022/07/07 12:02:12] [ info] [output:cloudwatch_logs:firelens.cloudwatch_logs] worker #0 started
[2022/07/07 12:02:12] [ info] [output:cloudwatch_logs:ApplicationLogs.cloudwatch_logs] worker #0 started
[2022/07/07 12:02:12] [ info] [output:cloudwatch_logs:RequestLogs.cloudwatch_logs] worker #0 started
[2022/07/07 12:02:12] [ info] [output:null:null.7] worker #0 started
[2022/07/07 12:02:12] [ info] [output:cloudwatch_logs:ServiceMetrics.cloudwatch_logs] worker #0 started
[2022/07/07 12:02:12] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2022/07/07 12:02:12] [ info] [sp] stream processor started
[2022/07/07 12:02:12] [ info] [output:cloudwatch_logs:firelens.cloudwatch_logs] Creating log stream STDOUT-ip-10-0-163-118.us-west-2.compute.internalApplication-firelens-fe1ebd2a2d8641abb332242c740303b5 in log group IhmPrimsDecouplerService-AppContainer-STDOUT
[2022/07/07 12:02:13] [ info] [output:cloudwatch_logs:firelens.cloudwatch_logs] Created log stream STDOUT-ip-10-0-163-118.us-west-2.compute.internalApplication-firelens-fe1ebd2a2d8641abb332242c740303b5
[2022/07/07 12:02:22] [ info] [output:cloudwatch_logs:ApplicationLogs.cloudwatch_logs] Creating log stream ApplicationLogs-ip-10-0-163-118.us-west-2.compute.internalApplicationLogs in log group IhmPrimsDecouplerService-ApplicationLogs
[2022/07/07 12:02:23] [ info] [output:cloudwatch_logs:ApplicationLogs.cloudwatch_logs] Created log stream ApplicationLogs-ip-10-0-163-118.us-west-2.compute.internalApplicationLogs
[2022/07/07 12:03:12] [ info] [input:tail:ServiceMetrics.tail] inotify_fs_add(): inode=2104788 watch_fd=1 name=/apollo/env/IhmPrimsDecouplerService/var/output/logs/service_log.2022-07-07-12
[2022/07/07 12:03:12] [ info] [output:cloudwatch_logs:ServiceMetrics.cloudwatch_logs] Creating log stream ServiceMetrics-ip-10-0-163-118.us-west-2.compute.internalServiceMetrics in log group IhmPrimsDecouplerService-ServiceMetrics
[2022/07/07 12:03:12] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log stream ip-10-0-163-118.us-west-2.compute.internal-fb-metrics-prom-format in log group IhmPrimsDecouplerService-FluentBitInternalMetrics
[2022/07/07 12:03:13] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Created log stream ip-10-0-163-118.us-west-2.compute.internal-fb-metrics-prom-format
[2022/07/07 12:03:13] [ info] [output:cloudwatch_logs:ServiceMetrics.cloudwatch_logs] Created log stream ServiceMetrics-ip-10-0-163-118.us-west-2.compute.internalServiceMetrics
[2022/07/07 12:04:40] [ info] [output:cloudwatch_logs:glob.log.cloudwatch_logs] Creating log stream GlobLog-ip-10-0-163-118.us-west-2.compute.internalGlobLog in log group IhmPrimsDecouplerService-AppContainer-STDOUT
[2022/07/07 12:04:41] [ info] [output:cloudwatch_logs:glob.log.cloudwatch_logs] Created log stream GlobLog-ip-10-0-163-118.us-west-2.compute.internalGlobLog
[2022/07/07 12:18:25] [error] [tls] error: error:00000005:lib(0):func(0):DH lib
[2022/07/07 12:18:25] [error] [src/flb_http_client.c:1165 errno=25] Inappropriate ioctl for device
[2022/07/07 12:18:25] [error] [tls] error: error:00000005:lib(0):func(0):DH lib
[2022/07/07 12:18:25] [error] [src/flb_http_client.c:1165 errno=25] Inappropriate ioctl for device
[2022/07/07 12:19:03] [error] [tls] error: error:00000005:lib(0):func(0):DH lib
[2022/07/07 12:19:03] [error] [src/flb_http_client.c:1175 errno=25] Inappropriate ioctl for device
[2022/07/07 12:19:19] [error] [http_client] broken connection to logs.us-west-2.amazonaws.com:443 ?
[2022/07/07 12:19:19] [ info] [output:cloudwatch_logs:ServiceMetrics.cloudwatch_logs] Got DataAlreadyAcceptedException, a previous retry must have succeeded asychronously
[2022/07/07 12:19:22] [error] [http_client] broken connection to logs.us-west-2.amazonaws.com:443 ?
[2022/07/07 12:19:22] [ info] [output:cloudwatch_logs:ServiceMetrics.cloudwatch_logs] Got DataAlreadyAcceptedException, a previous retry must have succeeded asychronously
[2022/07/07 12:19:22] [error] [http_client] broken connection to logs.us-west-2.amazonaws.com:443 ?
[2022/07/07 12:19:22] [ info] [output:cloudwatch_logs:ApplicationLogs.cloudwatch_logs] Got DataAlreadyAcceptedException, a previous retry must have succeeded asychronously
[2022/07/07 12:20:33] [error] [tls] error: error:00000005:lib(0):func(0):DH lib
[2022/07/07 12:20:33] [error] [src/flb_http_client.c:1175 errno=25] Inappropriate ioctl for device
[2022/07/07 12:21:13] [error] [http_client] broken connection to logs.us-west-2.amazonaws.com:443 ?
[2022/07/07 12:21:13] [ info] [output:cloudwatch_logs:ApplicationLogs.cloudwatch_logs] Got DataAlreadyAcceptedException, a previous retry must have succeeded asychronously
[2022/07/07 12:21:38] [error] [tls] error: error:00000005:lib(0):func(0):DH lib
[2022/07/07 12:21:38] [error] [src/flb_http_client.c:1165 errno=25] Inappropriate ioctl for device
[2022/07/07 12:22:25] [error] [http_client] broken connection to logs.us-west-2.amazonaws.com:443 ?
[2022/07/07 12:22:30] [ info] [output:cloudwatch_logs:ServiceMetrics.cloudwatch_logs] Got DataAlreadyAcceptedException, a previous retry must have succeeded asychronously
[2022/07/07 12:22:30] [error] [http_client] broken connection to logs.us-west-2.amazonaws.com:443 ?
  • Steps to reproduce the problem:

Expected behavior

plugin should successfully send data to CloudWatch without tls error

Screenshots

Your Environment

  • Version used: v1.9.3
  • Configuration:
[SERVICE]
  HTTP_Server  On
  HTTP_Listen  0.0.0.0
  HTTP_PORT    2020
  Flush                       1
  Log_Level                   debug
  Storage.path                /var/log/flb-storage
  Storage.sync                normal
  Storage.checksum            off
  Storage.backlog.mem_limit   128MB
  Storage.max_chunks_up       128
  Storage.metrics             on
  Grace                       30
  Parsers_File                /config/fluent-parser.conf
[INPUT]
  Name        tcp
  Tag         ApplicationLogs
  Listen      0.0.0.0
  Port        5170
  Format      none
  Alias       ApplicationLogs.tcp
   
[INPUT]
  Name        tcp
  Tag         RequestLogs
  Listen      0.0.0.0
  Port        5171
  Format      none
  Alias       RequestLogs.tcp
[INPUT]
  Name             tail
  Tag              ServiceMetrics
  Path             /
  Exclude_Path     *.gz
  Rotate_Wait      15
  Multiline        On
  Parser_Firstline QueryLogSeparator
  Parser_1         QueryLog
  Alias            ServiceMetrics.tail
   

[OUTPUT]
  Name              cloudwatch_logs
  Match             Application-firelens-*
  region            ${LOG_REGION}
  log_group_name    myLogGroup
  log_stream_prefix STDOUT-${HOSTNAME}
  auto_create_group false
  Alias             firelens.cloudwatch_logs

  • Environment name and version (e.g. Kubernetes? What version?): ECS/Fargate
  • Server type and version:
  • Operating System and version:
  • Filters and plugins:
Input: tail, TCP
Output: cloudwatch_logs

Additional context

@kaustubh1994
Copy link

kaustubh1994 commented Aug 22, 2022

I am facing similar issues. I am unable to send logs to cloud watch after a certain period of time. We start observing the exact error messages as mentioned above. I am already using the recommended configuration for cloud watch as mentioned here aws/aws-for-fluent-bit#340.

Fluent-bit version: v1.9.7

@nikhilgrover
Copy link

I am seeing this issue as well. It seems to occur exactly at an hourly cadence along with the error mentioned in this issue: #1724: Inappropriate ioctl for device. I am pulling the latest fluentbit for AWS version.

@matthewfala
Copy link
Contributor

This is the Fluent Bit network hang issue affecting CloudWatch. It should be resolved now by this PR:
#6339

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2023

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Mar 9, 2023
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 15, 2023
@markusthoemmes
Copy link
Contributor

markusthoemmes commented Feb 23, 2024

FWIW, I'm still seeing this sort of issue on the Datadog output in fluent-bit 2.2.2.

[2024/02/22 23:57:47] [error] [/src/fluent-bit/src/tls/openssl.c:433 errno=0] Success
[2024/02/22 23:57:47] [error] [tls] syscall error: error:00000005:lib(0):func(0):DH lib
[2024/02/22 23:57:47] [error] [http_client] broken connection to http-intake.logs.datadoghq.eu:443 ?
[2024/02/22 23:57:47] [error] [output:datadog:<redacted>] could not flush records to http-intake.logs.datadoghq.eu:443 (http_do=-1)
[2024/02/22 23:57:48] [ warn] [engine] failed to flush chunk '<redacted>', retry in 6 seconds: task_id=0, input=<redacted> > output=<redacted> (out_id=78)
[2024/02/23 01:04:48] [error] [/src/fluent-bit/src/tls/openssl.c:495 errno=32] Broken pipe
[2024/02/23 01:04:48] [error] [tls] syscall error: error:00000005:lib(0):func(0):DH lib
[2024/02/23 01:04:48] [error] [/src/fluent-bit/src/flb_http_client.c:1241 errno=32] Broken pipe
[2024/02/23 01:04:48] [error] [output:datadog:<redacted>] could not flush records to http-intake.logs.datadoghq.eu:443 (http_do=-1)
[2024/02/23 01:04:48] [ warn] [engine] failed to flush chunk '<redacted>', retry in 11 seconds: task_id=0, input=<redacted> > output=<redacted> (out_id=78)

Any clues if it needs a similar treatment @matthewfala ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants