-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Health lambas still time out occasionally (#6097) #6118
Fix: Health lambas still time out occasionally (#6097) #6118
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #6118 +/- ##
========================================
Coverage 84.89% 84.90%
========================================
Files 156 156
Lines 20690 20692 +2
========================================
+ Hits 17565 17568 +3
+ Misses 3125 3124 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is becoming unwieldy and a little brittle.
Can you try something like this, which is based on how we handle the thresholds:
Subject: [PATCH] [A 2/2] Derive manifest config for AnVIL from field mapping (#6110)
Name special fields in camel case on AnVIL. This repairs the IT.
---
Index: src/azul/chalice.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/azul/chalice.py b/src/azul/chalice.py
--- a/src/azul/chalice.py (revision 3b2831b32aa4c463218ddff66830a3df41567f14)
+++ b/src/azul/chalice.py (date 1712209742788)
@@ -108,6 +108,13 @@
value: int
+@attr.s(auto_attribs=True, frozen=True, kw_only=True)
+class Retry:
+ lambda_name: str
+ handler_name: Optional[str] = attr.ib(default=None)
+ value: int
+
+
C = TypeVar('C', bound='AppController')
@@ -526,6 +533,24 @@
value=threshold))
return thresholds
+ def retry(self, retries: int):
+ def wrapper(f):
+ assert isinstance(f, chalice.app.EventSourceHandler), f
+ f.retries = 2
+ return f
+
+ return wrapper
+
+ def retries(self) -> list[Retry]:
+ lambda_name, _ = config.unqualified_resource_name(self.app_name)
+ retries = []
+ for handler_name, handler in self.handler_map.items():
+ if isinstance(handler, chalice.app.EventSourceHandler):
+ value = getattr(handler, 'retries', 0)
+ retries.append(MetricThreshold(lambda_name=lambda_name,
+ handler_name=handler_name,
+ value=value))
+
@attr.s(auto_attribs=True, frozen=True, kw_only=True)
class AppController:
Index: lambdas/indexer/app.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/lambdas/indexer/app.py b/lambdas/indexer/app.py
--- a/lambdas/indexer/app.py (revision 3b2831b32aa4c463218ddff66830a3df41567f14)
+++ b/lambdas/indexer/app.py (date 1712209286051)
@@ -337,6 +337,7 @@
app.index_controller.contribute(event, retry=True)
+@app.retry(2)
@app.log_forwarder(
config.alb_access_log_path_prefix(deployment=None)
)
Index: terraform/api_gateway.tf.json.template.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/terraform/api_gateway.tf.json.template.py b/terraform/api_gateway.tf.json.template.py
--- a/terraform/api_gateway.tf.json.template.py (revision 3b2831b32aa4c463218ddff66830a3df41567f14)
+++ b/terraform/api_gateway.tf.json.template.py (date 1712208689183)
@@ -5,11 +5,14 @@
import json
from azul import (
+ cached_property,
config,
)
+from azul.chalice import AzulChaliceApp
from azul.deployment import (
aws,
)
+from azul.modules import load_app_module
from azul.objects import (
InternMeta,
)
@@ -40,6 +43,10 @@
*config.api_lambda_domain_aliases(name)
],
policy=json.dumps(getattr(policy_module, 'policy')))
+
+ @cached_property
+ def chalice(self) -> AzulChaliceApp:
+ return load_app_module(self.name).app
apps = [
@@ -372,16 +379,10 @@
'aws_lambda_function_event_invoke_config': {
function_name: {
'function_name': '${aws_lambda_function.%s.function_name}' % function_name,
- 'maximum_retry_attempts': retry_attempts
+ 'maximum_retry_attempts': retry.value
}
- for function_name, retry_attempts in
- [
- (f'indexer_{lm}', 0)
- for lm in ['forward_alb_logs', 'forward_s3_logs']
- if config.enable_log_forwarding
- ] + [
- (f'{lm}_{lm}cachehealth', 1) for lm in ['indexer', 'service']
- ]
+ for app in apps
+ for retry in app.chalice.retries()
}
},
*(
This has not been tested and may not handle config.enable_log_forwarding
being False.
342a8a6
to
d97fda1
Compare
5d71952
to
6dc6024
Compare
@hannes-ucsc, I've dropped the patch you suggested during PL (adding it to the end of this comment), since the default retry limit is two and in the conversation of when this retry constrictions to the forwarder lambdas were employed (#4920 (comment)) it's also confirmed. There is no longer a need for these lambdas to be part of the Subject: [PATCH] Increase retry for the log forwarder Lambdas
---
Index: lambdas/indexer/app.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/lambdas/indexer/app.py b/lambdas/indexer/app.py
--- a/lambdas/indexer/app.py
+++ b/lambdas/indexer/app.py
@@ -338,6 +338,7 @@
app.index_controller.contribute(event, retry=True)
+@app.retry(2 if config.enable_log_forwarding else None)
@app.log_forwarder(
config.alb_access_log_path_prefix(deployment=None)
)
@@ -345,6 +346,7 @@
app.log_controller.forward_alb_logs(event)
+@app.retry(2 if config.enable_log_forwarding else None)
@app.log_forwarder(
config.s3_access_log_path_prefix(deployment=None)
)
Index: src/azul/chalice.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/azul/chalice.py b/src/azul/chalice.py
--- a/src/azul/chalice.py
+++ b/src/azul/chalice.py
@@ -533,7 +533,7 @@
value=threshold))
return thresholds
- def retry(self, retries: int):
+ def retry(self, retries: int | None):
"""
Use this decorator to specify a custom number of retries that should be
different from the default (which is two) for any of the Azul async
@@ -541,8 +541,9 @@
https://docs.aws.amazon.com/lambda/latest/dg/invocation-retries.html
"""
def wrapper(f):
- assert isinstance(f, chalice.app.EventSourceHandler), f
- f.retries = retries
+ if retries is not None:
+ assert isinstance(f, chalice.app.EventSourceHandler), f
+ f.retries = retries
return f
return wrapper
|
ad335e0
to
e25a9a1
Compare
a915bb0
to
b24d6f0
Compare
e2fb053
to
e3c8430
Compare
c61cf7e
to
0b7c3c4
Compare
c6124a7
to
6a4d709
Compare
484f245
to
1854a8e
Compare
ea8e82e
to
a75eb6e
Compare
Security design review
|
a75eb6e
to
63b8a78
Compare
Connected issues: #6097
Checklist
Author
develop
issues/<GitHub handle of author>/<issue#>-<slug>
1 when the issue title describes a problem, the corresponding PR
title is
Fix:
followed by the issue titleAuthor (partiality)
p
tag to titles of partial commitspartial
or completely resolves all connected issuespartial
labelAuthor (chains)
base
or this PR is not chained to another PRchained
or is not chained to another PRAuthor (reindex, API changes)
r
tag to commit title or the changes introduced by this PR will not require reindexing of any deploymentreindex:dev
or the changes introduced by it will not require reindexing ofdev
reindex:anvildev
or the changes introduced by it will not require reindexing ofanvildev
reindex:anvilprod
or the changes introduced by it will not require reindexing ofanvilprod
reindex:prod
or the changes introduced by it will not require reindexing ofprod
reindex:partial
and its description documents the specific reindexing procedure fordev
,anvildev
,anvilprod
andprod
or requires a full reindex or carries none of the labelsreindex:dev
,reindex:anvildev
,reindex:anvilprod
andreindex:prod
API
or this PR does not modify a REST APIa
(A
) tag to commit title for backwards (in)compatible changes or this PR does not modify a REST APIapp.py
or this PR does not modify a REST APIAuthor (upgrading deployments)
make image_manifests.json
and committed the resulting changes or this PR does not modifyazul_docker_images
, or any other variables referenced in the definition of that variableu
tag to commit title or this PR does not require upgrading deploymentsupgrade
or does not require upgrading deploymentsdeploy:shared
or does not modifyimage_manifests.json
, and does not require deploying theshared
component for any other reasondeploy:gitlab
or does not require deploying thegitlab
componentdeploy:runner
or does not require deploying therunner
imageAuthor (hotfixes)
F
tag to main commit title or this PR does not include permanent fix for a temporary hotfixanvilprod
andprod
) have temporary hotfixes for any of the issues connected to this PRAuthor (before every review)
develop
, squashed old fixupsmake requirements_update
or this PR does not modifyrequirements*.txt
,common.mk
,Makefile
andDockerfile
R
tag to commit title or this PR does not modifyrequirements*.txt
reqs
or does not modifyrequirements*.txt
make integration_test
passes in personal deployment or this PR does not modify functionality that could affect the IT outcomePeer reviewer (after approval)
System administrator (after approval)
demo
orno demo
no demo
no sandbox
N reviews
label is accurateOperator (before pushing merge the commit)
reindex:…
labels andr
commit title tagno demo
develop
_select dev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused
or this PR is not labeleddeploy:shared
_select dev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply
or this PR is not labeleddeploy:gitlab
_select anvildev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused
or this PR is not labeleddeploy:shared
_select anvildev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply
or this PR is not labeleddeploy:gitlab
deploy:gitlab
deploy:gitlab
System administrator
dev.gitlab
are complete or this PR is not labeleddeploy:gitlab
anvildev.gitlab
are complete or this PR is not labeleddeploy:gitlab
Operator (before pushing merge the commit)
_select dev.gitlab && make -C terraform/gitlab/runner
or this PR is not labeleddeploy:runner
_select anvildev.gitlab && make -C terraform/gitlab/runner
or this PR is not labeleddeploy:runner
sandbox
label or PR is labeledno sandbox
dev
or PR is labeledno sandbox
anvildev
or PR is labeledno sandbox
sandbox
deployment or PR is labeledno sandbox
anvilbox
deployment or PR is labeledno sandbox
sandbox
deployment or PR is labeledno sandbox
anvilbox
deployment or PR is labeledno sandbox
sandbox
or this PR does not remove catalogs or otherwise causes unreferenced indices indev
anvilbox
or this PR does not remove catalogs or otherwise causes unreferenced indices inanvildev
sandbox
or this PR is not labeledreindex:dev
anvilbox
or this PR is not labeledreindex:anvildev
sandbox
or this PR is not labeledreindex:dev
anvilbox
or this PR is not labeledreindex:anvildev
p
if the PR is also labeledpartial
Operator (chain shortening)
develop
or this PR is not labeledbase
chained
label from the blocked PR or this PR is not labeledbase
base
base
label from this PR or this PR is not labeledbase
Operator (after pushing the merge commit)
dev
anvildev
dev
dev
anvildev
anvildev
_select dev.shared && make -C terraform/shared apply
or this PR is not labeleddeploy:shared
_select anvildev.shared && make -C terraform/shared apply
or this PR is not labeleddeploy:shared
dev
anvildev
Operator (reindex)
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
Operator
deploy:shared
,deploy:gitlab
,deploy:runner
,reindex:partial
,reindex:anvilprod
andreindex:prod
labels to the next promotion PRs or this PR carries none of these labelsdeploy:shared
,deploy:gitlab
,deploy:runner
,reindex:partial
,reindex:anvilprod
andreindex:prod
labels, from the description of this PR to that of the next promotion PRs or this PR carries none of these labelsShorthand for review comments
L
line is too longW
line wrapping is wrongQ
bad quotesF
other formatting problem