You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that the backoff/retry configuration only applies to connection errors (e.g. no network), in which case libbeat's backoff policy kicks in and retries. However, since connect and close don't really make sense for Kinesis (and their implementation in the client is just a stub), this results in immediate retry.
So, here are a few examples of cases where the kinesis stream client enters an infinite cpu loop:
When the stream's throughput is exceeded, the events are retried immediately and in most cases result in more throughput errors (bcz no backoff).
When the stream's IAM permissions are missing, the error from Kinesis is permission denied, and the records are then retried immediately and infinitely, resulting in AWS API rate limiting.
When kinesis is hit too frequently (e.g. because of the above) the error is Rate Limit, in which case the client simply retries immediately, exacerbating the problem.
Unclear if this is a problem to solve at:
The output's level (add some retrying to put_records.
The aws sdk level (it knows how to backoff in certain circumstances)
The publisher
to recreate this problem, create a kinesis stream and deny putRecords permission to it to everyone, then feed a single input event to filebeat and see the cpu go to 100% and stay there.
The text was updated successfully, but these errors were encountered:
It seems that the backoff/retry configuration only applies to connection errors (e.g. no network), in which case libbeat's backoff policy kicks in and retries. However, since
connect
andclose
don't really make sense for Kinesis (and their implementation in the client is just a stub), this results in immediate retry.So, here are a few examples of cases where the kinesis stream client enters an infinite cpu loop:
Unclear if this is a problem to solve at:
to recreate this problem, create a kinesis stream and deny putRecords permission to it to everyone, then feed a single input event to filebeat and see the cpu go to 100% and stay there.
The text was updated successfully, but these errors were encountered: