You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know there are a lot of other tickets talking about this, but they either accept the log messages/metrics, or are very old.
We are getting a lot of InvalidSequenceTokenException errors in our logs and then our Prometheus metrics are also filled with these errors. I believe they are recovering fine with the retries that have been put in place, but I'm curious if there is a better way to work around this?
We use flush_threads: 4 in our buffer configuration and then run two replicas of this FluentD configuration. It seems that the only way to fix this is to run a single replica with a single thread, but then that wouldn't be redundant 😞
Lastly, I am curious what the difference between the concurrency parameter and the Buffer's parameter flush_threads is. They sound like they are doing the same thing, but maybe I should be using one over the other?
Steps to replicate
# that is a Kubernetes Pod's logs
# These are aggregated by a fluent-bit running on each
# Kubernetes node, and then forwarded to central processing,
# which includes this configuration snippet
# NOTE: I have excluded prometheus and other non-essential pieces of the config
<source>
@type forward
port 24284
bind 0.0.0.0
tag pod.source
@label @POD_SOURCE
</source>
<label @POD_SOURCE>
<filter **>
@type record_transformer
enable_ruby true
<record>
namespace ${record["kubernetes"]["namespace_name"]}
pod ${record["kubernetes"]["pod_name"]}
</record>
</filter>
<match **>
@type rewrite_tag_filter
<rule>
key namespace
pattern /(.+)/
tag $1
</rule>
@label @POD_STEP2
</match>
</label>
<label @POD_STEP2>
<match **>
@type rewrite_tag_filter
<rule>
key pod
pattern /(.+)/
tag ${tag}_$1
</rule>
@label @POD_OUTPUT
</match>
</label>
<label @POD_OUTPUT>
<match **>
@type copy
<store>
@type s3
s3_bucket foobar
s3_region us-east-1
s3_object_key_format "#{ENV['ENVIRONMENT']}/eks-pod-logs/%Y-%m-%d/${tag}/%H_%{index}_%{uuid_flush}.%{file_extension}"
<format>
@type json
</format>
<buffer tag,time>
timekey 1h
@type memory
flush_mode interval
retry_type exponential_backoff
flush_thread_count 4
flush_interval 5s
retry_forever false
retry_max_interval 30
chunk_limit_size 8MB
chunk_full_threshold 0.90
overflow_action throw_exception
compress gzip
</buffer>
</store>
<store>
@type cloudwatch_logs
log_group_name /infra/logs/eks/pods/stage
log_stream_name %Y-%m-%d-%H-${tag}
auto_create_stream true
region us-east-1
<buffer tag, time>
timekey 1m
@type memory
flush_mode interval
retry_type exponential_backoff
flush_thread_count 4
flush_interval 5s
retry_forever false
retry_max_interval 30
chunk_limit_size 8MB
chunk_full_threshold 0.90
overflow_action throw_exception
compress gzip
</buffer>
</store>
</match>
</label>
Expected Behavior or What you need to ask
The s3 logging works just fine, but cloudwatch has a lot of errors about the token being out of sequence. This might come down to how AWS implements their services, but I feel like there must be a better way than retrying a bunch and having logs and metrics be filled up with errors, but maybe that would require a lot more work than it is worth? 🤷 Any suggestions are welcome, too.
Using Fluentd and CloudWatchLogs plugin versions
OS version: Docker image fluentd:v1.9.1-1.0
Bare Metal or within Docker or Kubernetes or others? within Kubernetes in AWS EKS
Fluentd v0.12 or v0.14/v1.0
paste result of fluentd --version or td-agent --version => fluentd 1.9.1
Dependent gem versions
paste boot log of fluentd or td-agent
paste result of fluent-gem list, td-agent-gem list or your Gemfile.lock
Problem
I know there are a lot of other tickets talking about this, but they either accept the log messages/metrics, or are very old.
We are getting a lot of
InvalidSequenceTokenException
errors in our logs and then our Prometheus metrics are also filled with these errors. I believe they are recovering fine with the retries that have been put in place, but I'm curious if there is a better way to work around this?We use
flush_threads: 4
in our buffer configuration and then run two replicas of this FluentD configuration. It seems that the only way to fix this is to run a single replica with a single thread, but then that wouldn't be redundant 😞Lastly, I am curious what the difference between the
concurrency
parameter and the Buffer's parameterflush_threads
is. They sound like they are doing the same thing, but maybe I should be using one over the other?Steps to replicate
Expected Behavior or What you need to ask
The s3 logging works just fine, but cloudwatch has a lot of errors about the token being out of sequence. This might come down to how AWS implements their services, but I feel like there must be a better way than retrying a bunch and having logs and metrics be filled up with errors, but maybe that would require a lot more work than it is worth? 🤷 Any suggestions are welcome, too.
Using Fluentd and CloudWatchLogs plugin versions
fluentd:v1.9.1-1.0
fluentd --version
ortd-agent --version
=>fluentd 1.9.1
fluent-gem list
,td-agent-gem list
or your Gemfile.lockThe text was updated successfully, but these errors were encountered: