Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB Streams lag monitoring #5

Open
ivanarkhipov opened this issue Oct 14, 2016 · 13 comments
Open

DynamoDB Streams lag monitoring #5

ivanarkhipov opened this issue Oct 14, 2016 · 13 comments
Assignees

Comments

@ivanarkhipov
Copy link

Hello!
We're using DynamoDB Streams + Kinesis Client Library (KCL).
How can we measure latency between event was created in a stream and it was processed on KCL side?

As I know, KCL's MillisBehindLatest metric is specific to Kinesis Streams.
approximateCreationDateTime record attribute has a minute-level approximation, which is not acceptable for monitoring in sub-second latency systems.
Could you please help with some useful metrics for monitoringDynamoDB Streams latency?

Thank you!

Ivan

@pfifer
Copy link

pfifer commented Oct 26, 2016

This feature is currently on DynamoDB's road map, but they don't currently have an ETA.

@amcp
Copy link
Contributor

amcp commented Apr 24, 2017

Put the System.timeInMillis() in an item attribute on your own when you put and update items. As long as your stream view type is NEW_IMAGES or OLD_AND_NEW_IMAGES and your item updates contain this timestamp, you can get a better approximation.

@amcp amcp closed this as completed Apr 24, 2017
@joelittlejohn
Copy link

@amcp I'm afraid adding an item attribute does not solve this issue. I think this one should be reopened.

The requirement here is for lag a metric. This means the time (in millis) between the current item, and the latest item that was added to the stream. From the docs for MillisBehindLatest:

The number of milliseconds the GetRecords response is from the tip of the stream, indicating how far behind current time the consumer is. A value of zero indicates record processing is caught up, and there are no new records to process at this moment.

If no new items are added, the client is not lagging (even if the time attribute on the item is old). This is very different to checking a time attribute on the item.

@amcp amcp reopened this Apr 24, 2017
@amcp
Copy link
Contributor

amcp commented Apr 24, 2017

Seems to me you are interested in the age of stream records relative to the tip of each shard. Each processor works on a shard forward in time. Each time you do a GetRecords call on your usual shard iterator, you could also get a shard iterator for that shard of type LATEST and compute the lag you seek in that manner. Note that shards can roll over for size and age or split for throughput reasons so you might have to do a few calls to get to the latest child shard. By sampling the tip of each shard lineage, you could keep a pretty good estimate of how much you lag.

@amcp
Copy link
Contributor

amcp commented Apr 24, 2017

Here is some good related reading (also includes links to prior articles).
https://noise.getoto.net/2016/08/19/monitor-your-application-for-processing-dynamodb-streams/

@joelittlejohn
Copy link

Another measure of lag is the number of records between the current set of records and the tip. It would be good if this library implemented some help with either kind of lag monitoring.

@amcp
Copy link
Contributor

amcp commented Apr 25, 2017

Together with the lag estimates above you could also use 1 minute CloudWatch ConsumedCapacity metrics on the table to estimate the number of writes accepted per second, allowing you to backtrack the number of records between your Stream Worker and the heads of offspring shard lineages.

@amcp
Copy link
Contributor

amcp commented Apr 25, 2017

Another thing you could do is feed the DynamoDB Stream into a Kinesis stream with a Lambda, and use the MillisBehindLatest metric from Kinesis records. Seems a bit over the top though.

@Mentis
Copy link

Mentis commented Dec 21, 2017

Any updates on that? Is there any other way to identify how long particular event sits in the stream?

@aggarwal
Copy link
Contributor

The value of ApproximateCreationDateTime is precise to the second as of January 2019.

We're currently working on emitting a MillisBehindLatest metric from the adapter package that will emit the difference between ApproximateCreationDateTime from the GetRecords result and System.currentTimeMillis() on the client. Emitting this metric will allow a large majority of customers to get some basic monitoring out of the box. This will allow you to track how far behind you are in processing your stream. We expect to release this change in the next few weeks.

The DynamoDB Streams GetRecords API does not currently expose any data about the amount or age of records that were written after the records returned in a batch. Making this data available is a large project that requires architectural changes in the service. We'll consider this in our 6-12 month roadmap.

@pietropra
Copy link

pietropra commented Dec 7, 2020

@aggarwal this is great to know I was just looking at this information.
Perhaps is worth making it explicit in the documentation?

https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_StreamRecord.html

@aggarwal
Copy link
Contributor

Version 1.5.3 now includes the implementation of MillisBehindLatest as described above. Due to limitations of how the metric object is scoped in KCL, this metric is emitted at the stream-shard level, and not at the application-level.

https://github.com/awslabs/dynamodb-streams-kinesis-adapter/releases/tag/1.5.3

@jeet23
Copy link

jeet23 commented Aug 18, 2022

Hi @aggarwal,
Is the MillisBehindLatest metric available for DynamoDB Streams Kinesis adapter as well?

Or is it only for the Kinesis streams?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants