Recommended Cloudwatch_Logs Configuration #340

matthewfala · 2022-05-03T21:02:32Z

NOTICE: please see the main tracking ticket for multiple recently reported high impact issues in AWS for Fluent Bit: #542

Recommended Cloudwatch_Logs Configuration

Recently our team has received lots of inquires on tuning the cloudwatch_logs output plugin via it's configuration

Customers using out of tune cloudwatch configurations may experience log loss due to:

Broken connection / network errors
Lack of retries on batch failures
Lack of immediate network retries on network failure

These issues can be resolved via appropriate configuration.

If you are configuring FireLens via a Fluent Bit config file, use the following cloudwatch_logs configuration:

[OUTPUT]
    # general cloudwatch_logs configuration (nothing special here, customize to fit your use case)
    Name                cloudwatch_logs
    Match               ApplicationLogs
    region              ${LOG_REGION}
    log_group_name      ${SERVICE_NAME}-ApplicationLogs
    log_stream_prefix   ApplicationLogs--${HOSTNAME}
    auto_create_group   On

    # if you want to only write the log string without container metadata fields
    log_key             log

    # from aws-for-fluent-bit v2.32.0 and on, to support higher throughput logging,
    # set workers to a high value such as 5 or the number of cores on your host
    workers             1

    # optimized cloudwatch_logs output configuration
    # delayed retries on error 
    retry_limit         5    
    # on is default
    net.keepalive On
    # CW uses 6s idle timeout, FLB has 1.5s timer to check conns.
    # 4s ensures FLB always closes the conn itself, which we found 
    # significantly reduces the rate of network error messages it outputs
    net.keepalive_idle_timeout 4s

If you are configuring FireLens via task definition logDriver configuration options:

"logConfiguration": {
	"logDriver":"awsfirelens",
	"options": {

// general cloudwatch_logs configuration (nothing special here, customize to fit your use case)
		"Name": "cloudwatch_logs",
		"region": "${LOG_REGION}",
		"log_group_name": "${SERVICE_NAME}-ApplicationLogs",
		"log_stream_prefix": "ApplicationLogs--${HOSTNAME}",
		"auto_create_group": "On",
		"log_key": "log",

// optimized cloudwatch_logs output configuration
		"workers": "1",
		"auto_retry_requests": "On",
		"retry_limit": "5"
	}
}

We may update the above configuration from time to time to reflect the cloudwatch_logs configuration that provides the best performance.

The text was updated successfully, but these errors were encountered:

PettitWesley · 2023-06-14T21:26:12Z

These settings used to be in the example but are no longer since they are same as the defaults since 1.9.x Fluent Bit upstream version series:

    # create a separate thread for each cloudwatch_output (does not work with more than one worker per log stream due to cloudwatch_logs API concurrency limitations)
    # as of Fluent Bit 1.9, 1 worker is the default
    workers             1   
    # retry network requests immediately on failure
    # this setting also defaults to "On" in the 1.9 series. 
    auto_retry_requests On

Duplo-Yashwant · 2024-01-18T13:31:54Z

Is there dual options is supported under logConfiguration? Like sending logs to cloudwatch as well as opensearch.

#### Motivation Fluent Bit is experiencing a lot of network errors connecting to `logs.ap-southeast-2.amazonaws.com`. This amount of errors does increase the log storage cost, see #374. This is a known issue for which Fluent Bit team made [some recommendations to reduce it](aws/aws-for-fluent-bit#340). This PR is applying one of these recommendations and has been tested with success on non prod. #### Modification - Remove [the patch](#374) that stops sending Fluent Bit application logs to CloudWatch - Set the Fluent Bit `keepalive idle timeout` to 4s (default is 1.5s) following [the recommendations made here](aws/aws-for-fluent-bit#340). #### Checklist - [ ] Tests updated - N/A - [x] Docs updated - [x] Issue linked in Title --------- Co-authored-by: Victor Engmark <vengmark@linz.govt.nz>

matthewfala added the guidance Customer is seeking guidance from us/the community label May 3, 2022

matthewfala closed this as completed May 3, 2022

matthewfala reopened this May 3, 2022

DrewZhang13 mentioned this issue May 26, 2022

[http_client] broken connection to firehose.eu-west-1.amazonaws.com:443 #354

Open

kaustubh1994 mentioned this issue Aug 22, 2022

[tls] error: error:00000005:lib(0):func(0):DH lib fluent/fluent-bit#5705

Closed

PettitWesley mentioned this issue Dec 12, 2022

[Datadog output] Version 2.29.0 Causing Task to stop #491

Closed

matthewfala mentioned this issue Jan 26, 2023

Fluent Bit Hang Affecting CloudWatch C Plugin aws-for-fluent-bit v2.28.4 and Prior #525

Closed

vkadi mentioned this issue Apr 12, 2023

Frequent errors when using output plugin cloudwatch_logs #274

Open

matthewfala mentioned this issue Sep 29, 2023

2.32.0 release #733

Merged

paulfouquet mentioned this issue Jan 5, 2024

fix: reduce Fluent Bit network errors TDE-1016 linz/topo-workflows#378

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommended Cloudwatch_Logs Configuration #340

Recommended Cloudwatch_Logs Configuration #340

matthewfala commented May 3, 2022 •

edited by PettitWesley

Loading

PettitWesley commented Jun 14, 2023

Duplo-Yashwant commented Jan 18, 2024

Recommended Cloudwatch_Logs Configuration #340

Recommended Cloudwatch_Logs Configuration #340

Comments

matthewfala commented May 3, 2022 • edited by PettitWesley Loading

Recommended Cloudwatch_Logs Configuration

PettitWesley commented Jun 14, 2023

Duplo-Yashwant commented Jan 18, 2024

matthewfala commented May 3, 2022 •

edited by PettitWesley

Loading