-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health lambas still times out occasionally #6097
Comments
@hannes-ucsc: "We should consider raising the error threshold." |
Assignee to determine a reasonable threshold based on historic alarm data. |
Excluding events prior to merging #5467 (which added added a gateway endpoint to S3 and DynamoDb that significantly reduced execution failures) into develop on Feb 6, and the lone standing event of #5927 which caused a single, transient execution failure, only the The only two failures were on the So based on this data, it seems appropriate to set the retry limit for this lambdas to one, given that the occurrence rate is low at only two per month, and to only be evident in the |
@hannes-ucsc: "Changed my mind, the retry increase for log forwarder lambdas should occur in PR #6217 for #5622 which is all about log forwarder lambdas. The fix for this issue would just involve explicitly setting the retry for the health check lambdas to 0 and increasing the error alarm threshold to one per day." |
For demo, show that the health check lambdas aren't retried, and that there were no alarms during a day in which either or both of the health check lambdas timed out exactly once. |
Task timed out after the request to the bundles endpoint took too long.
The text was updated successfully, but these errors were encountered: