Description
Overview
I've followed the guide described in graceful-shutdown-with-aws-lambda. Unfortunately, my signal handler is never invoked.
I spent time investigating and it looks like the native code in awslambdaric
is not bubbling up the signal preventing the signal handler registered from ever running. I don't have experience with writing native code python extensions, so I don't know what needs to change in runtime_client.cpp
so that this works.
The graceful shutdown guide actually has a note at the end of the readme which says:
Please be aware that this feature does not work on Python 3.8 and 3.9 runtimes.
I think this is exactly because of the issue I am describing here (and mentioned here aws-samples/graceful-shutdown-with-aws-lambda#2).
Steps to reproduce.
I took a simple lambda function handler.py
:
import signal
from types import FrameType
def shutdown(_signum: int, _frame: FrameType | None) -> None:
print("Shutdown hook invoked")
signal.signal(signal.SIGTERM, shutdown)
def lambda_handler(event, context):
print(f"Handled event {event}")
return "SUCCESS"
and created a lambda in eu-central-1
with this as a zip, adding the layer arn:aws:lambda:eu-central-1:580247275435:layer:LambdaInsightsExtension:38
and invoked it once. I waited until the instance was spun down (10-15 minutes), but there was no log from the shutdown hook.
I also deployed it as a docker image with this Dockerfile:
FROM public.ecr.aws/lambda/python:3.10.2023.07.13.16
# download manually before building the image:
# curl -f -s -o lambda-insights-amd64-38.zip "$(aws lambda get-layer-version-by-arn --arn "arn:aws:lambda:eu-central-1:580247275435:layer:LambdaInsightsExtension:38" --query 'Content.Location' --output text)"
COPY lambda-insights-amd64-38.zip /tmp
# hadolint ignore=DL3033
RUN yum install -y unzip && unzip -q -d /opt /tmp/lambda-insights-amd64-38.zip && rm /tmp/*.zip && \
yum history undo -y last && yum clean all && rm -rf /var/cache/yum
COPY handler.py "$LAMBDA_TASK_ROOT/handler.py"
CMD ["handler.lambda_handler"]
I pushed this into ECR and then created a container-based lambda with this image again in eu-central-1
and invoked it once. Again, I waited until the instance was spun down (10-15 minutes), but there was no log from the shutdown hook.
To prove that SIGTERM was being sent to the runtime, I altered the previous image to just be a simple shell script and updated the container-based lambda from the previous step:
startup.sh
#!/usr/bin/env bash
set -e
set -o pipefail
printf 'Starting lambda handler. Running as pid %s\n' "$BASHPID"
shutdown() {
printf 'Shutting down gracefully\n'
exit
}
trap 'shutdown' SIGTERM
while true ; do
HEADERS="$(mktemp)"
BODY="$(mktemp)"
curl -f -sS -LD "$HEADERS" -X GET "http://$AWS_LAMBDA_RUNTIME_API/2018-06-01/runtime/invocation/next" -o "$BODY"
REQUEST_ID="$(grep -Fi Lambda-Runtime-Aws-Request-Id "$HEADERS" | tr -d '[:space:]' | cut -d: -f2)"
rm -rf "$HEADERS"
printf 'Processing request %s\n' "$REQUEST_ID"
curl -f -sS -X POST "http://$AWS_LAMBDA_RUNTIME_API/2018-06-01/runtime/invocation/$REQUEST_ID/response" -d '"SUCCESS"'
rm -rf "$BODY"
done
with this Dockerfile
:
FROM public.ecr.aws/lambda/python:3.10.2023.07.13.16
# download manually before building the image:
# curl -f -s -o lambda-insights-amd64-38.zip "$(aws lambda get-layer-version-by-arn --arn "arn:aws:lambda:eu-central-1:580247275435:layer:LambdaInsightsExtension:38" --query 'Content.Location' --output text)"
COPY lambda-insights-amd64-38.zip /tmp
# hadolint ignore=DL3033
RUN yum install -y unzip && unzip -q -d /opt /tmp/lambda-insights-amd64-38.zip && rm /tmp/*.zip && \
yum history undo -y last && yum clean all && rm -rf /var/cache/yum
COPY startup.sh "$LAMBDA_TASK_ROOT/startup.sh"
ENTRYPOINT ["/var/runtime/startup.sh"]
I invoked the lambda and got the "SUCCESS"
response. I waited for it to spin down and checked the cloudwatch logs again:
LOGS Name: cloudwatch_lambda_agent State: Subscribed Types: [Platform]
Starting lambda handler. Running as pid 12
Name: cloudwatch_lambda_agent State: Ready Events: [INVOKE, SHUTDOWN]
START RequestId: 1127e4c8-97e8-4ef9-9c52-e24673eb40a7 Version: $LATEST
Processing request 1127e4c8-97e8-4ef9-9c52-e24673eb40a7
{""status"":""OK""}
END RequestId: 1127e4c8-97e8-4ef9-9c52-e24673eb40a7
REPORT RequestId: 1127e4c8-97e8-4ef9-9c52-e24673eb40a7 Duration: 304.42 ms Billed Duration: 590 ms Memory Size: 128 MB Max Memory Used: 35 MB Init Duration: 285.31 ms
Terminated
Shutting down gracefully
Success, so I knew that SIGTERM was being sent. After this, I suspected that it had to be somewhere in the runtime handling for python. While reading the source code for awslambdaric
I realised part of it was written in cpp and I wondered if this might be the issue.
To test whether this was the problem, I replaced the runtime_client
written in cpp, with one written in python.
runtime_client_py.py
# Reimplementation of runtime_client.cpp in python
import http
import http.client
import os
lambda_runtime_address = os.environ["AWS_LAMBDA_RUNTIME_API"]
user_agent = "awslambdaric"
class RuntimeClient:
@staticmethod
def initialize_client(user_agent: str) -> None:
user_agent = user_agent
@staticmethod
def next() -> tuple[bytes, dict]:
endpoint = "/2018-06-01/runtime/invocation/next"
runtime_connection = http.client.HTTPConnection(lambda_runtime_address)
runtime_connection.connect()
runtime_connection.request("GET", endpoint, headers={"User-Agent": user_agent})
response = runtime_connection.getresponse()
response_body = response.read()
raw_headers = response.headers
headers = {
"Lambda-Runtime-Aws-Request-Id": raw_headers.get(
"Lambda-Runtime-Aws-Request-Id"
),
"Lambda-Runtime-Trace-Id": raw_headers.get("Lambda-Runtime-Trace-Id")
or None,
"Lambda-Runtime-Invoked-Function-Arn": raw_headers.get(
"Lambda-Runtime-Invoked-Function-Arn"
),
"Lambda-Runtime-Deadline-Ms": int(
raw_headers.get("Lambda-Runtime-Deadline-Ms") or "0"
),
"Lambda-Runtime-Client-Context": raw_headers.get(
"Lambda-Runtime-Client-Context"
)
or None,
"Lambda-Runtime-Cognito-Identity": raw_headers.get(
"Lambda-Runtime-Cognito-Identity"
)
or None,
"Content-Type": raw_headers.get("Content-Type") or None,
}
return response_body, headers
@staticmethod
def post_invocation_result(
invoke_id: str, result_data: bytes, content_type: str
) -> None:
from awslambdaric.lambda_runtime_client import LambdaRuntimeClientError
endpoint = f"/2018-06-01/runtime/invocation/{invoke_id}/response"
runtime_connection = http.client.HTTPConnection(lambda_runtime_address)
runtime_connection.connect()
runtime_connection.request(
"POST",
endpoint,
body=result_data,
headers={"User-Agent": user_agent, "Content-Type": content_type},
)
response = runtime_connection.getresponse()
if response.status != http.HTTPStatus.ACCEPTED:
raise LambdaRuntimeClientError(endpoint, response.status, None)
@staticmethod
def post_error(invoke_id: str, error_response_data: bytes, xray_fault: str) -> None:
endpoint = f"/2018-06-01/runtime/invocation/{invoke_id}/error"
runtime_connection = http.client.HTTPConnection(lambda_runtime_address)
runtime_connection.connect()
runtime_connection.request(
"POST",
endpoint,
body=error_response_data,
headers={
"User-Agent": user_agent,
"Content-Type": "application/json",
"Lambda-Runtime-Function-Error-Type": "UnhandledContent",
"Lambda-Runtime-Function-xray-error-cause": xray_fault,
},
)
runtime_client = RuntimeClient()
and I put this in an image with this Dockerfile
:
FROM public.ecr.aws/lambda/python:3.10.2023.07.13.16
# download manually before building the image:
# curl -f -s -o lambda-insights-amd64-38.zip "$(aws lambda get-layer-version-by-arn --arn "arn:aws:lambda:eu-central-1:580247275435:layer:LambdaInsightsExtension:38" --query 'Content.Location' --output text)"
COPY lambda-insights-amd64-38.zip /tmp
# hadolint ignore=DL3033
RUN yum install -y unzip && unzip -q -d /opt /tmp/lambda-insights-amd64-38.zip && rm /tmp/*.zip && \
yum history undo -y last && yum clean all && rm -rf /var/cache/yum
RUN rm "$LAMBDA_RUNTIME_DIR/runtime_client.cpython"*.so && sed -i 's/import runtime_client/from awslambdaric.runtime_client import runtime_client/' "$LAMBDA_RUNTIME_DIR/awslambdaric/lambda_runtime_client.py"
COPY runtime_client_py.py "$LAMBDA_RUNTIME_DIR/awslambdaric/runtime_client.py"
COPY handler.py "$LAMBDA_TASK_ROOT/handler.py"
CMD ["handler.lambda_handler"]
I updated the image used by the lambda again and invoked it and waited for it to spin back down. I checked the cloudwatch logs again:
LOGS Name: cloudwatch_lambda_agent State: Subscribed Types: [Platform]
EXTENSION Name: cloudwatch_lambda_agent State: Ready Events: [INVOKE, SHUTDOWN]
START RequestId: faab07eb-60f6-47c6-b5ba-0dced4097ec9 Version: $LATEST
Handled event {'test': 'value'}
END RequestId: faab07eb-60f6-47c6-b5ba-0dced4097ec9
REPORT RequestId: faab07eb-60f6-47c6-b5ba-0dced4097ec9 Duration: 48.67 ms Billed Duration: 383 ms Memory Size: 128 MB Max Memory Used: 45 MB Init Duration: 333.96 ms
Shutdown hook invoked
Success! So indeed it looks like the native code needs some adjustment to handle the signal.
Unfortunately, I can't offer a pull request to fix this since I don't know enough about how this works. Some searching points out that perhaps the code needs to be using PyErr_CheckSignals
. awslambdaric
is also using aws-lambda-runtime
internally, so there might be some interaction there too that needs even more handling.
Anyway, I hope these reproduction steps can help someone with the skills find the correct solution.
Thanks.