Enhanced metrics for DynamoDB backend #21588

thomashargrove · 2023-07-05T18:09:09Z

Is your feature request related to a problem? Please describe.

We recently had a multi-hour outage on a vault cluster using DynamoDB backend and our investigation was challenging due to limited information in logs and metrics. Vault was reporting a very low RATE(dynamodb_get_total_count) close to zero, but the AWS console was reporting ~25k qps of DynamoDB read requests. The number of core_in_flight_requests spiked to 200k, and we had large numbers of go routines and high memory usage. We suspect the DynamoDB backend was returning errors and the AWS client was doing infinite retries (we did not have AWS_DYNAMODB_MAX_RETRIES set, but that is fixed now). If that was the case, then most likely we reached the limit on the DynamoDB PermitPool causing all the other requests to wait on that lock. But as far as we can tell, there is nothing in the logs or metrics that would indicate we have reached the limit of our PermitPool.

Describe the solution you'd like

Metrics covering the DynamoDB PermitPool would be very useful. If we had gauge metrics for Active Permits, Pool Size, and Permits Waiting we could quickly determine that DynamoDB layer is the cause of requests backing up.

Describe alternatives you've considered

Explain any additional use-cases

Additional context

The text was updated successfully, but these errors were encountered:

VioletHynes · 2023-08-18T20:27:27Z

Closing, as this has been implemented by #21742. Thanks for the issue and the PR!

mpalmi added ecosystem enhancement storage/dynamodb labels Jul 6, 2023

jhillkwaj mentioned this issue Jul 10, 2023

Add permet pool metrics to dynamo db backend #21742

Merged

VioletHynes closed this as completed Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhanced metrics for DynamoDB backend #21588

Enhanced metrics for DynamoDB backend #21588

thomashargrove commented Jul 5, 2023

VioletHynes commented Aug 18, 2023

Enhanced metrics for DynamoDB backend #21588

Enhanced metrics for DynamoDB backend #21588

Comments

thomashargrove commented Jul 5, 2023

VioletHynes commented Aug 18, 2023