diff --git a/content/en/blog/2025/observing-lambdas/diagram-execution-timing.svg b/content/en/blog/2025/observing-lambdas/diagram-execution-timing.svg new file mode 100644 index 000000000000..9d246d26cae0 --- /dev/null +++ b/content/en/blog/2025/observing-lambdas/diagram-execution-timing.svg @@ -0,0 +1,544 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/content/en/blog/2025/observing-lambdas/index.md b/content/en/blog/2025/observing-lambdas/index.md new file mode 100644 index 000000000000..a1836e2f4904 --- /dev/null +++ b/content/en/blog/2025/observing-lambdas/index.md @@ -0,0 +1,124 @@ +--- +title: Observing Lambdas using the OpenTelemetry Collector Extension Layer +author: '[Dominik Süß](https://github.com/theSuess) (Grafana)' +linkTitle: Observing Lambdas +date: 2025-02-05 +sig: FaaS +issue: 5961 +cSpell:ignore: Dominik +--- + +Getting telemetry data out of modern applications is very straightforward (or at +least it should be). You set up a collector which either receives data from your +application or asks it to provide an up-to-date state of various counters. This +happens every minute or so, and if it’s a second late or early, no one really +bats an eye. But what if the application isn’t around for long? What if every +second waiting for the data to be collected is billed? Then you’re most likely +thinking of Function-as-a-Service (FaaS) environments, the most well-known being +AWS Lambda. + +In this execution model, functions are called directly, and the environment is +frozen afterward. You’re only billed for actual execution time and no longer +need a server to wait for incoming requests. This is also where the term +serverless comes from. Keeping the function alive until metrics can be collected +isn’t really an option and even if you were willing to pay for that, different +invocations will have a completely separate context and not necessarily know +about all the other executions happening simultaneously. You might now be +saying: "I'll just push all the data at the end of my execution, no issues +here!", but that doesn’t solve the issue. You’ll still have to pay for the time +it takes to send the data and with many invocations, this adds up. + +But there is another way! Lambda extension layers allow you to run any process +alongside your code, sharing the execution runtime and providing additional +services. With the +[opentelemetry-lambda](https://github.com/open-telemetry/opentelemetry-lambda/blob/main/collector/README.md) +extension layer, you get a local endpoint to send data to while it keeps track +of the Lambda lifecycle and ensures your telemetry gets to the storage layer. + +## How does it work? + +When your function is called for the first time, the extension layer starts an +instance of the OpenTelemetry Collector. The Collector build is a stripped down +version, providing only components necessary in the context of Lambda. It +registers with the Lambda +[Extensions API](https://docs.aws.amazon.com/lambda/latest/dg/runtimes-extensions-api.html) +and +[Telemetry API](https://docs.aws.amazon.com/lambda/latest/dg/telemetry-api.html). +By doing this, it receives notifications whenever your function is executed, +emits a logline, or the execution context is about to be shut down. + +### This is where the magic happens + +Up until now, this just seems like extra work for nothing. You'll still have to +wait for the Collector to export the data, right? This is where the special +`decouple` processor comes in. It separates the receiving and exporting +components while interfacing with the Lambda lifecycle. This allows for the +Lambda to return, even if not all data has been sent. At the next invocation (or +on shutdown) the Collector continues exporting the data while your function does +its thing. + +{{< figure src="diagram-execution-timing.svg" caption="Diagram showcasing how execution timing differs with and without a Collector">}} + +## How can I use it? + +As of November 2024, the opentelemetry-lambda project publishes +[releases of the Collector extension layer](https://github.com/open-telemetry/opentelemetry-lambda/releases/tag/layer-collector%2F0.12.0). +It can be configured through a configuration file hosted either in an S3 bucket +or on an arbitrary HTTP server. It is also possible to bundle the configuration +file with your Lambda code. In both cases, you have tradeoffs to consider. +Remote configuration files add to the cold start duration as an additional +request needs to be made, while bundling the configuration increases the +management overhead when trying to control the configuration for multiple +Lambdas. + +The simplest way to get started is with an embedded configuration. For this, add +a file called `collector.yaml` to your function. This is a regular Collector +configuration file. To take advantage of the Lambda specific extensions, they +need to be configured. As an example, the configuration shown next receives +traces and logs from the Telemetry API and sends them to another endpoint. + +```yaml +receivers: + telemetryapi: +exporters: + otlphttp/external: + endpoint: 'external-collector:4318' +processors: + batch: + decouple: +service: + pipelines: + traces: + receivers: [telemetryapi] + processors: [batch, decouple] + exporters: [otlphttp/external] + logs: + receivers: [telemetryapi] + processors: [batch, decouple] + exporters: [otlphttp/external] +``` + +The `decouple` processor is configured by default if omitted. It is explicitly +added in this example to illustrate the entire pipeline. For more information, +see +[Autoconfiguration](https://github.com/open-telemetry/opentelemetry-lambda/tree/main/collector#auto-configuration). + +Afterward, set the `OPENTELEMETRY_COLLECTOR_CONFIG_URI` environment variable to +`/var/task/collector.yaml`. Once the function is redeployed, you’ll see your +function logs appear! You can see this in action in the video below. + +

+ +

+ +Every log line your Lambda produces will be sent to the `external-collector` +endpoint specified. You don't need to modify the code at all! From there, +telemetry data flows to your backend as usual. Since the transmission of +telemetry data might be frozen when the lambda is not active, logs can arrive +delayed. They'll either arrive during the next execution or during the shutdown +interval. + +If you want further insight into your applications, also see the +[language specific auto instrumentation layers](https://github.com/open-telemetry/opentelemetry-lambda/?tab=readme-ov-file#extension-layer-language-support). diff --git a/content/en/blog/2025/observing-lambdas/video-lambda-real-time.webm b/content/en/blog/2025/observing-lambdas/video-lambda-real-time.webm new file mode 100644 index 000000000000..2e1a36da9eda Binary files /dev/null and b/content/en/blog/2025/observing-lambdas/video-lambda-real-time.webm differ diff --git a/static/refcache.json b/static/refcache.json index d9637fbc6341..151517855854 100644 --- a/static/refcache.json +++ b/static/refcache.json @@ -999,10 +999,18 @@ "StatusCode": 206, "LastSeen": "2025-02-01T07:13:03.593647-05:00" }, + "https://docs.aws.amazon.com/lambda/latest/dg/runtimes-extensions-api.html": { + "StatusCode": 206, + "LastSeen": "2025-01-24T22:43:02.419614644Z" + }, "https://docs.aws.amazon.com/lambda/latest/dg/services-xray.html": { "StatusCode": 206, "LastSeen": "2025-01-30T16:54:20.250322-05:00" }, + "https://docs.aws.amazon.com/lambda/latest/dg/telemetry-api.html": { + "StatusCode": 206, + "LastSeen": "2025-01-24T22:43:04.492478898Z" + }, "https://docs.aws.amazon.com/xray/latest/devguide/aws-xray.html": { "StatusCode": 206, "LastSeen": "2025-02-01T07:12:11.606851-05:00" @@ -8363,10 +8371,26 @@ "StatusCode": 206, "LastSeen": "2025-01-13T11:43:25.504793-05:00" }, + "https://github.com/open-telemetry/opentelemetry-lambda/": { + "StatusCode": 206, + "LastSeen": "2025-02-04T12:41:09.851793718Z" + }, + "https://github.com/open-telemetry/opentelemetry-lambda/blob/main/collector/README.md": { + "StatusCode": 206, + "LastSeen": "2025-01-24T22:43:01.001644641Z" + }, "https://github.com/open-telemetry/opentelemetry-lambda/releases": { "StatusCode": 206, "LastSeen": "2025-02-02T10:58:45.265975-05:00" }, + "https://github.com/open-telemetry/opentelemetry-lambda/releases/tag/layer-collector%2F0.12.0": { + "StatusCode": 206, + "LastSeen": "2025-01-24T22:43:07.062401954Z" + }, + "https://github.com/open-telemetry/opentelemetry-lambda/tree/main/collector#auto-configuration": { + "StatusCode": 206, + "LastSeen": "2025-02-04T12:41:05.846247381Z" + }, "https://github.com/open-telemetry/opentelemetry-operator": { "StatusCode": 206, "LastSeen": "2025-01-07T10:31:41.081498-05:00" @@ -12083,6 +12107,10 @@ "StatusCode": 206, "LastSeen": "2025-02-02T10:24:50.885838-05:00" }, + "https://github.com/theSuess": { + "StatusCode": 206, + "LastSeen": "2025-02-04T12:41:01.062302518Z" + }, "https://github.com/theletterf": { "StatusCode": 206, "LastSeen": "2025-02-01T07:10:28.628158-05:00"