out_prometheus_remote_write: Refresh expired AWS credentials#9765
out_prometheus_remote_write: Refresh expired AWS credentials#9765Tradunsky wants to merge 1 commit intofluent:masterfrom
Conversation
68a4fd7 to
e14c40d
Compare
|
Hi team, Hope you had a great holiday season! 🤗 Please let me know if I can do anything else. |
|
Hi folks, Please let me know if I can do anything about this PR to get it merged. 🙏🏻 |
|
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
|
Pending replies |
e14c40d to
d77c06d
Compare
WalkthroughAdds explicit HTTP 403 handling in Prometheus Remote Write: on 403, trigger AWS credentials refresh when SigV4 is enabled and mark request for retry. Adjusts the generic non-2xx path to exclude 403. Fixes typos in status-handling comments. No API signature changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor FB as Fluent Bit PW Output
participant S as Remote Endpoint
participant AWS as AWS Provider (optional)
FB->>S: HTTP POST /remote_write (metrics)
S-->>FB: HTTP Response (status)
alt 200–205
FB->>FB: Mark success (OK)
else 400
FB->>FB: Log error and do not retry
else 403 (new path)
opt SigV4 enabled
FB->>AWS: Refresh credentials
Note right of AWS: Refresh via provider vtable
end
FB->>FB: Set outcome = RETRY
else Other non-OK
FB->>FB: Set outcome = RETRY (optional payload log)
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
plugins/out_prometheus_remote_write/remote_write.c (3)
186-186: Use the standard HTTP reason phrase for 203.Replace with “Non-Authoritative Information” for clarity.
- * - 203: no authoritative resp + * - 203: Non-Authoritative Information
206-207: Fix grammar in comment.“an error” instead of “a error”.
- /* Returned 400 status means unrecoverable. Immediately - * returning as a error. */ + /* Returned 400 status means unrecoverable. Immediately + * returning as an error. */
220-226: Optional: log payload on 403 and consider 401 as a refresh trigger.
- When log_response_payload is enabled, emit the response body at info/debug to aid diagnosis (e.g., expired token vs. access denied).
- If you see 401 from some backends, optionally treat it like 403 for AWS-auth cases.
Would you like me to generate a small follow-up patch for payload logging and an opt-in 401 handling flag?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/out_prometheus_remote_write/remote_write.c(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (58)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-centos-7
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: PR - fuzzing test
🔇 Additional comments (2)
plugins/out_prometheus_remote_write/remote_write.c (2)
191-193: Good exclusion of 403 from the generic retry path.This cleanly routes 403s to the specialized handler below.
220-226: Resolved — provider refresh is already synchronizedThe AWS provider exposes a per-provider mutex and refresh implementations use try_lock_provider()/unlock_provider() (see include/fluent-bit/flb_aws_credentials.h and examples in src/aws/flb_aws_credentials_ec2.c and src/aws/flb_aws_credentials_profile.c); provider creation initializes the mutex. Calling ctx->aws_provider->provider_vtable->refresh(ctx->aws_provider) is safe — no extra mutex/rate-limit needed in this plugin.
|
Followed the code style. Please review. |
This fix handles case of fluent-bit stuck retying with expired AWS credentials. Signed-off-by: Tradunsky <tradunskih@gmail.com>
|
Ping. |
|
ping |
Handle 403 http error code when credentials expired, credentials refreshed from
~/.aws/credentials.Fixes: #9670
Similar implementation already exists for
kinesis_streams:fluent-bit/src/aws/flb_aws_util.c
Line 205 in 3178e6e
Steps to reproduce:
~/.aws/credentialswith min session duration time 900 (anything to reproduce quickly). Put the credentials form the command output to~/.aws/credentialsunder default profile.#1to refresh credentials in~/.aws/credentialswith much fresh credentials (usually done by an automation):Before the PR fix: Fluent-bit keeps failing with 403 as it is using old expired credentials that are cached in memory
After the PR fix: Fluent-bit picks up fresh credentials without downtime.
Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit