Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define FaaS Metric Semantics #1052

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,8 @@ Updates:
([#993](https://github.com/open-telemetry/opentelemetry-specification/pull/993))
- Trace API: Clarifications for `Span.End`, e.g. IsRecording becomes false after End
([#1011](https://github.com/open-telemetry/opentelemetry-specification/pull/1011))
- Add metric semantic conventions for faas
([#1052](https://github.com/open-telemetry/opentelemetry-specification/pull/1052))

## v0.6.0 (07-01-2020)

Expand Down
20 changes: 20 additions & 0 deletions semantic_conventions/metrics/faas.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
groups:
- id: faas-metrics
prefix: faas
brief: >
This document defines the attributes used in
faas (function as a service) metrics.
attributes:
- ref: faas.trigger
required: always
- ref: faas.invoked_name
required: always
- ref: faas.invoked_provider
required: always
- ref: faas.invoked_region
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it mandatory to have a region ? What is the target for that convention, public cloud 'faas' only or also 'faas' running on-premise or somewhere else (like OpenFaaS or Knative which probably also would fall into that realm, even when no 'faas' but just 'serverless'. Tbh, faas as metrics here feels too narrow as you can cover with this metrics also serverless deployments, that are not a 'faas' (e.g. not centered around functions as a programming model but e.g containers that are operated in a serverless manner).

required: always
- ref: faas.coldstart
required: always
Comment on lines +10 to +17
Copy link
Member

@thisthat thisthat Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coldstart is for an incoming lambda and invoked_* are for an outgoing lambda. I don't think it is correct to request all of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree about coldstarts. Can add a condition there.

For invoked_*, doesn't this apply for incoming as well? Maybe I misunderstand the trace spec here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding, the invoked_* attributes specify which function is invoked from a client and not the execution of such a function. For a function that is being executed, the same attributes are available as resource attributes. Namely, the mapping is:
invoked_name -> faas.name
invoked_provider -> cloud.provider
invoked_region -> cloud.region

- id: error
type: boolean
brief: 'Whether or not the function resulted in an error.'
53 changes: 53 additions & 0 deletions specification/metrics/semantic_conventions/faas-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# General

The conventions described in this section are FaaS (Function as a Service) specific. When FaaS operations occur,
metric events about those operations will be generated and reported to provide insight into the
operations. By adding FaaS labels to metric events it allows for finely tuned filtering.

**Disclaimer:** These are initial FaaS metric instruments and labels but more may be added in the future.

## Metric Instruments

The following metric instruments MUST be used to describe FaaS operations. They MUST be of the specified
type and units.

Naming conventions follow [FaaS Trace Semantics](/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/faas.md) wherever possible.
kolanos marked this conversation as resolved.
Show resolved Hide resolved

### FaaS Invocations

Below is a table of FaaS invocation metric instruments.

| Name | Instrument | Units | Description |
kolanos marked this conversation as resolved.
Show resolved Hide resolved
|------|------------|-------|-------------|
| `faas.execution_duration` | ValueRecorder | milliseconds | Measures the duration of the invocation, the time the function spent processing an event. |
| `faas.init_duration` | ValueRecorder | milliseconds | Measures the duration of the function's initialization, such as a cold start |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really an expert in FAAS, so maybe this is obvious to some of you, but I don't know the relationship between invoke_duration and init_duration. Is init duration included in invoke duration?
Or does total duration = invoke_duration + init_duration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinfoote Good question. AWS Lambda considers an invocation's duration inclusive of any initialization (cold starts). So a faas.invoke_duration could be considerably longer during a cold start than for subsequent invocations post-initialization. But cold starts do have a real effect in a FaaS context. So wanted to make sure faas.invoke_duration reflected this reality. But would welcome opinions on whether duration metrics should be exclusive or inclusive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tough question, and I'd love for someone with more experience with serverless architecture in general to weigh in.

I'm curious about how this is represented in tracing. I know that there's a top-level SERVER span defined for faas invocations. Is there a nested span within that that represents the initialization? If so, how is that span identified as initialization time?

| `faas.timeouts` | Counter | number of timeouts | number of invocation timeouts. A timeout is an execution that reaches or exceeds configured execution time limits. |
| `faas.throttles` | Counter | number of throttles | number of invocation throttles. A throttle is an invocation rejected when concurrrency limits are reached or exceeded. |
| `faas.concurrent_executions` | UpDownCounter | number of concurrent executions | The current number of function instances that are processing events. |
Comment on lines +25 to +26
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about these two. I don't think an instrumented function can report these values. I see these as metrics that a backend can compute aggregating the data it receives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these metrics limited to only values that can be collected from within a function? Every FaaS platform that I researched has a way to extract these metrics, however in most cases it is via an API external to the function itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me, but if the metrics are collected using an API, I think they'll need to use an asynchronous instrument. I think maybe this means it should be a UpDownSumObserver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. We should not specify only data that can be collected within a function! :)


## Labels

Below is a table of the labels that SHOULD be included on FaaS metric events.

Naming conventions follow [FaaS Trace Semantics](/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/faas.md) wherever possible.
kolanos marked this conversation as resolved.
Show resolved Hide resolved

| Name | Recommended | Notes and examples |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can generate this table with:

<!-- semconv faas-metrics -->
<!-- endsemconv -->

and then you can use the semantic convention generator :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...not just yet. I'll have a PR to add metric semantic convention generation soon, and then we can update all the metric semantic conventions with generated tables in a single PR later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table looks identical to a semantic convention table. How do you plan to change the render of metric? But I guess this discussion does not belong to this PR, I will wait for your PR to update the tool :)

|------|-------------|--------------------|
| `faas.trigger` | Yes | Type of the trigger on which the function is invoked. SHOULD be one of: `datasource`, `http`, `pubsub`, `timer`, `other`. See: [Function Trigger Types](/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/faas.md) |
kolanos marked this conversation as resolved.
Show resolved Hide resolved
| `faas.invoked_name` | Yes | Name of the invoked function. Example: `my-function` |
| `faas.invoked_provider` | Yes | Cloud provider of the invoked function. Corresponds to the resource `cloud.provider`. Example: `aws` |
| `faas.invoked_region` | Yes | Cloud provider region of invoked function. Corresponds to resource `cloud.region`. Example: `us-east-1` |
| `faas.coldstart` | Yes | Whether or not the invocation was a cold start. |
| `faas.error` | Yes | Whether or not the invocation resulted in an error. |

## References

### Metric Reference

Below are links to documentation regarding metrics that are available with different
FaaS providers. This list is not exhaustive.

* [AWS Lambda Metrics](https://docs.aws.amazon.com/lambda/latest/dg/monitoring-metrics.html)
* [Azure Functions Metrics](https://docs.microsoft.com/en-us/azure/azure-monitor/platform/metrics-supported)
* [Google CloudFunctions Metrics](https://cloud.google.com/monitoring/api/metrics_gcp#gcp-cloudfunctions)
* [OpenFaas Metrics](https://docs.openfaas.com/architecture/metrics/)