MillisBehindLatest metric across _all_ shards #249

usrenmae · 2017-10-19T08:03:29Z

Currently several metrics, including MillisBehindLatest are reported to CloudWatch with one of the dimensions being a shard id. On the other side we find it very convenient to set CloudWatch alarms on top of this metric to be able to react, if any shard starts to lag behind. Now it is not possible to set up alerts without specifying the exact name of the shard. This is a limiting factor, because once you add and remove shards constantly, the shard names are being very dynamic and each time they change, you need to change the alarms accordingly, which is frustrating. In general as one want to react to any shard lagging behind, it would be very nice to have a global MillisBehindLatest without relating it to any shard in its dimensions. This can be the maximum across all shards, like MaxMillisBehindLatest.

The text was updated successfully, but these errors were encountered:

sahilpalvia · 2017-10-19T20:22:36Z

Kinesis does emit a Stream level metrics for iterator age, called GetRecords.IteratorAgeMillis. You should be able to setup alarm on that metric. That metric can be found under the Kinesis namespace in CloudWatch. If you set the statistic for that metric to Maximum it'll map the maximum millisBehindLatest from all the shards for that given period. Please feel free to reopen the issue, if you still have questions.

usrenmae · 2017-10-20T12:52:51Z

Thanks for informing about the GetRecords.IteratorAgeMilliseconds metric. I wasn't aware of this one. After a closer look into it I figured out it's a global per-stream metric of the Kinesis service. What I'm interested in is a per-consumer metric. We have multiple consumers running on the same stream, some of them may catch up the event feed perfectly, but others may lag. My idea was to have a metric which can tell you which particular consumer is lagging behind. It's not possible to get this information out of the GetRecords.IteratorAgeMilliseconds metric of Kinesis stream itself, but KCL could provide this metric similar way it provides the MillisBehindLatest, but without the shardId dimension.
Actually it is not convenient at all to have automation built around any shard-specific metrics, as shards are very dynamic on their own and may change in time, considering the fact that it is not possible to have an alarm on a metric with dimensions, but not specifying the dimension value. When monitoring is build on per-consumer basis, it's much more useful: one can setup permanent alarms on it and only in case of incident it's possible to trace back the particular shard with the shard-specific metrics already.
Please re-opening the issue as suggested above.

sahilpalvia · 2017-10-20T17:46:29Z

Thank you for the feedback. We agree with the change you have suggested, and will prioritize it accordingly against the other customer requests we receive.

StevenYCChou · 2018-02-06T06:40:42Z

@sahilpalvia I also have same use case which we want to scale up/down based on how fast KCL application consumes. this metric will be helpful.

ghost · 2018-03-15T15:45:07Z

We have a similar use case and would like this metric as well. We have two kcl consumers on the same kinesis stream. One has a low threshold requirement while the other has a much higher threshold of latency.

We've set the alarm at the lower threshold on the stream, but it alarms once or twice a day because of the higher latency kcl consumer. We have to treat it as an alarm situation each time which obviously causes a lot of time wasted.

We've considered using the shard level metric, however being on the limit of max alarms allowed and having a 60 shard stream, that is not possible currently.

akumariiit · 2018-10-07T18:45:35Z

@sahilpalvia we also have exact same use case, can you provide any update on this?

pfifer · 2018-10-08T19:05:20Z

We don't have an update at this time. This is a feature we are interested adding, and will prioritize it with all customer requests.

For all of those interested can you please post a reaction on the parent post, this will assist us in prioritizing customer requests.

waffleshop · 2018-10-08T19:36:16Z

+1

vinujan59 · 2018-10-09T04:23:38Z

+1

vik7 · 2018-10-09T04:31:02Z

+1

akumariiit · 2018-10-31T07:30:52Z

+1

rkass · 2018-11-28T21:37:18Z

+1

winty56 · 2019-03-21T08:13:54Z

+1
We have more than 500 shards in Kinesis and more than 4 KCL application using same Kinesis. In AWS Cloudwatch console, we can not search all shard because Console search result limit is 500. so we do not use KCL Metrics. Although the number of indicators we can graph at one time is limited to 100 in console. This feature is essential for me to check lag of each KCL Application.

kaisermario · 2021-06-14T07:51:29Z

+1

kaisermario · 2021-06-14T07:51:52Z

@pfifer Any update?

MeisterMasi · 2021-06-14T08:41:40Z

+1

CCBow-501 · 2021-06-14T08:56:34Z

+1

yasemin-amzn · 2021-06-14T17:22:45Z

Hello,

There are service side metrics emitted for monitoring stream-level behind-ness. For consumers using GetRecords, "GetRecords.IteratorAgeMilliseconds" metric will be emitted and all consumer applications will be contributing to this metric. Consumer applications using enhanced fanout will be emitting "SubscribeToShardEvent.MillisBehindLatest" metric along with the consumer name, so status of each consumer can be monitored individually.

Consider using these metrics as an alternative to client-side metrics for monitoring application health.

For more details please refer to: https://docs.aws.amazon.com/streams/latest/dev/monitoring-with-cloudwatch.html

kaisermario · 2021-06-22T07:24:10Z

Hello @yasemin-amzn ,
"SubscribeToShardEvent.MillisBehindLatest" is a basic (stream level) metric according to: https://docs.aws.amazon.com/streams/latest/dev/monitoring-with-cloudwatch.html

Stream-level data is sent automatically every minute at no charge.

Unfortunately we can't see this metric in our account.

leifbladt · 2021-06-24T07:08:43Z

+1

QwertV2 · 2021-11-30T06:59:43Z

+1

sahilpalvia added the question label Oct 19, 2017

sahilpalvia closed this as completed Oct 19, 2017

sahilpalvia added the enhancement label Oct 20, 2017

sahilpalvia reopened this Oct 20, 2017

ghost mentioned this issue Oct 9, 2018

Publish lag for each kinesis consumer #437

Closed

QAQJ mentioned this issue Nov 23, 2021

Adding a new metric: Application-level MillisBehindLatest #868

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MillisBehindLatest metric across _all_ shards #249

MillisBehindLatest metric across _all_ shards #249

usrenmae commented Oct 19, 2017

sahilpalvia commented Oct 19, 2017

usrenmae commented Oct 20, 2017

sahilpalvia commented Oct 20, 2017

StevenYCChou commented Feb 6, 2018

ghost commented Mar 15, 2018

akumariiit commented Oct 7, 2018

pfifer commented Oct 8, 2018

waffleshop commented Oct 8, 2018

vinujan59 commented Oct 9, 2018

vik7 commented Oct 9, 2018

akumariiit commented Oct 31, 2018

rkass commented Nov 28, 2018

winty56 commented Mar 21, 2019 •

edited

Loading

kaisermario commented Jun 14, 2021

kaisermario commented Jun 14, 2021

MeisterMasi commented Jun 14, 2021

CCBow-501 commented Jun 14, 2021

yasemin-amzn commented Jun 14, 2021

kaisermario commented Jun 22, 2021

leifbladt commented Jun 24, 2021

QwertV2 commented Nov 30, 2021

MillisBehindLatest metric across _all_ shards #249

MillisBehindLatest metric across _all_ shards #249

Comments

usrenmae commented Oct 19, 2017

sahilpalvia commented Oct 19, 2017

usrenmae commented Oct 20, 2017

sahilpalvia commented Oct 20, 2017

StevenYCChou commented Feb 6, 2018

ghost commented Mar 15, 2018

akumariiit commented Oct 7, 2018

pfifer commented Oct 8, 2018

waffleshop commented Oct 8, 2018

vinujan59 commented Oct 9, 2018

vik7 commented Oct 9, 2018

akumariiit commented Oct 31, 2018

rkass commented Nov 28, 2018

winty56 commented Mar 21, 2019 • edited Loading

kaisermario commented Jun 14, 2021

kaisermario commented Jun 14, 2021

MeisterMasi commented Jun 14, 2021

CCBow-501 commented Jun 14, 2021

yasemin-amzn commented Jun 14, 2021

kaisermario commented Jun 22, 2021

leifbladt commented Jun 24, 2021

QwertV2 commented Nov 30, 2021

winty56 commented Mar 21, 2019 •

edited

Loading