-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define OTEL_EXPERIMENTAL_CONFIG_FILE to ignore other env vars, add env var substitution default syntax #3948
Define OTEL_EXPERIMENTAL_CONFIG_FILE to ignore other env vars, add env var substitution default syntax #3948
Conversation
…y-specification into define-otel-config-file
…y-specification into define-otel-config-file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described in this #3752 (comment) and other comments on the discussion:
This would break the mechanism AWS lambda, otel-operator and potentially other providers/environments outside of otel repositories already use to specify default service name, resource attributes, propagators, and maybe other settings (like exporters).
The alternative suggested in the discussion (asking users to keep boilerplate for env vars in the file) is fragile. It also removes incentive from cloud providers to integrate with otel by providing common env vars. They would rather ask users to handle their existing env vars resulting in degraded user experience.
I can think of two ways to accommodate the requirement for platforms / distributions to customize configuration:
|
Ability to customize SDK behavior does apply to cloud providers. Azure Functions don't know which distro users wants to use and can't inject the distro for them. I see some problems with the second approach as well:
|
Here's an example of merging a custom resource attribute into a config file using an open source command line YAML processor called yq (i.e. YAML equivalent of the popular jq). Given
To add a new resource attribute
Its a trade off. A bit more complex than setting
I propose writing to a new file and reassigning
You've said that azure functions doesn't know or control which distro a user wants to use. Its entirely possible for a user to choose a distribution that ignores I'll refer to the arguments I previously made about breaking changes for env vars: #3752 (comment) We should not consider a change in behavior resulting from a user opting into something new to be a breaking change. Consider another example: A vendor provides a distribution of otel which sets env vars to export OTLP to their endpoint, i.e. This convo is reminiscent of the debate we had over whether extending the span interface to include a new |
there is a problem of conflict resolution. By pushing it away to cloud providers, we don't solve it, but make it worse. Azure will decide to do it differently in every SaaS, PaaS, FaaS, AWS will do it in yet another way. I do think it makes user experience worse.
Can you elaborate? User specifies
I don't agree. I can use key-vaults to retrieve my secrets from, I can dynamically format my config file in the startup scripts/dockerfile to retrieve data from configuration services. Env vars are not a secure way to do secret management at all.
Virtually all OTel SDKs support OTEL_SERVICE_NAME and OTEL_RESOURCE_ATTRIBUTES env vars. Distro that choses to ignore them would not have a great experience on Azure functions (AWS lambda or in otel-operator). there are problems now, but this proposal would make things worse. From supported virtually everywhere, env vars would be degraded to "supported until user accidentally opts opt"
The subject of back-compat is cloud providers, not users here. The breaking change affects them. Users would opt-in without realizing that it will break their cloud provider integration with otel. I want to emphasize, that I don't understand why this is necessary - why we can't make critical env vars work. |
The key is that you're making an argument that this is a breaking change to I think its more correct to frame your argument as: merging the set of env vars which can be cleanly merged is a better user experience and that this improved user experience should be given high priority as an evaluation criteria. Framing it as "not doing this is breaking" while cherry-picking which env vars it applies to is self-contradicting. But as I stated here, the ideas you're discussing are an entirely new proposal, not a modification of the existing proposal. There's virtually nothing left of this recommendation once we adjust it. And that's fine. I don't agree, but its fine for us to disagree and use the escalation path available. |
I'm glad you've heard me. Let's frame it as overarching "better user experience" if it makes my point more understandable. |
Hello @jack-berg , I'm part of the Azure Functions team, and we're currently adding support for OpenTelemetry. Our architecture consists of a host process and a worker process, with customer code running in the worker process. We handle the host process, while customers have full control over the worker process. In the host process, we are developing providers for logs, traces, and metrics that will send telemetry data to the OTLP endpoint. This setup is influenced by environment variables that signal us to connect the OTLP exporter and resources. We expect customers to set up something similar in the worker process. The configuration seems promising, aiming to offer customers more flexibility in setting up providers in the host process. I appreciate the convenience of using both configuration files and environment variables simultaneously. For instance, customers can use configuration files to establish providers and environment variables like OTEL_EXPORTER_OTLP_ENDPOINT for setting the endpoint. Unlike configurations, environment variables are accessible in both host and worker processes. However, the proposed changes could significantly affect our implementation. If customers switch to using configurations by setting OTEL_CONFIG_FILE, it might disrupt the host process. We will have to come-up with a solution to resolve the conflict between config and env variables. I understand the complexities and reasons behind your proposal, but it could significantly complicate our operations. Ideally, I'd like to see a solution where both environment variables and configuration files can exist together. |
Could you provide more details on this ? Once the |
Hi @marcalff , environment variables are accessible by both the host and worker processes. If the customer chooses to use OTEL_CONFIG_FILE, the OTel SDK operating within the host process will disregard all OTEL related environment variables. |
@RohitRanjanMS I don't understand the problem. I'll summarize my understanding, and leave you to correct anything I misinterpret:
Is that right? |
BTW I just realized that And also I realized that the Do you expect all users to keep it?! file_format: "0.1"
disabled: ${OTEL_SDK_DISABLED:-false}
resource:
attributes:
service.name: ${OTEL_SERVICE_NAME:-unknown_service}
attribute_limits:
attribute_value_length_limit: ${OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT}
attribute_count_limit: ${OTEL_ATTRIBUTE_COUNT_LIMIT:-128}
tracer_provider:
processors:
- batch:
schedule_delay: ${OTEL_BSP_SCHEDULE_DELAY:-5000}
export_timeout: ${OTEL_BSP_EXPORT_TIMEOUT:-30000}
max_queue_size: ${OTEL_BSP_MAX_QUEUE_SIZE:-2048}
max_export_batch_size: ${OTEL_BSP_MAX_EXPORT_BATCH_SIZE:-512}
exporter:
otlp:
protocol: ${OTEL_EXPORTER_OTLP_PROTOCOL:-http/protobuf}
endpoint: ${OTEL_EXPORTER_OTLP_PROTOCOL:-http://localhost:-4318}
certificate: ${OTEL_EXPORTER_OTLP_CERTIFICATE}
client_key: ${OTEL_EXPORTER_OTLP_CLIENT_KEY}
client_certificate: ${OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE}
compression: ${OTEL_EXPORTER_OTLP_COMPRESSION:-gzip}
timeout: ${OTEL_EXPORTER_OTLP_TIMEOUT:-10000}
limits:
attribute_value_length_limit: ${OTEL_SPAN_ATTRIBUTE_VALUE_LENGTH_LIMIT}
attribute_count_limit: ${OTEL_SPAN_ATTRIBUTE_COUNT_LIMIT:-128}
event_count_limit: ${OTEL_SPAN_EVENT_COUNT_LIMIT:-128}
link_count_limit: ${OTEL_SPAN_LINK_COUNT_LIMIT:-128}
event_attribute_count_limit: ${OTEL_EVENT_ATTRIBUTE_COUNT_LIMIT:-128}
link_attribute_count_limit: ${OTEL_LINK_ATTRIBUTE_COUNT_LIMIT:-128}
meter_provider:
readers:
- periodic:
interval: ${OTEL_METRIC_EXPORT_INTERVAL:-60000}
timeout: ${OTEL_METRIC_EXPORT_TIMEOUT:-30000}
exporter:
otlp:
protocol: ${OTEL_EXPORTER_OTLP_PROTOCOL:-http/protobuf}
endpoint: ${OTEL_EXPORTER_OTLP_PROTOCOL:-http://localhost:-4318}
certificate: ${OTEL_EXPORTER_OTLP_CERTIFICATE}
client_key: ${OTEL_EXPORTER_OTLP_CLIENT_KEY}
client_certificate: ${OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE}
compression: ${OTEL_EXPORTER_OTLP_COMPRESSION:-gzip}
timeout: ${OTEL_EXPORTER_OTLP_TIMEOUT:-10000}
temporality_preference: ${OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE:-cumulative}
default_histogram_aggregation: ${OTEL_EXPORTER_OTLP_METRICS_DEFAULT_HISTOGRAM_AGGREGATION:-explicit_bucket_histogram}
logger_provider:
processors:
- batch:
schedule_delay: ${OTEL_BLRP_SCHEDULE_DELAY:-1000}
export_timeout: ${OTEL_BLRP_EXPORT_TIMEOUT:-30000}
max_queue_size: ${OTEL_BLRP_MAX_QUEUE_SIZE:-2048}
max_export_batch_size: ${OTEL_BLRP_MAX_EXPORT_BATCH_SIZE:-512}
exporter:
otlp:
protocol: ${OTEL_EXPORTER_OTLP_PROTOCOL:-http/protobuf}
endpoint: ${OTEL_EXPORTER_OTLP_PROTOCOL:-http://localhost:-4318}
certificate: ${OTEL_EXPORTER_OTLP_CERTIFICATE}
client_key: ${OTEL_EXPORTER_OTLP_CLIENT_KEY}
client_certificate: ${OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE}
compression: ${OTEL_EXPORTER_OTLP_COMPRESSION:-gzip}
timeout: ${OTEL_EXPORTER_OTLP_TIMEOUT:-10000}
limits:
attribute_value_length_limit: ${OTEL_LOGRECORD_ATTRIBUTE_VALUE_LENGTH_LIMIT}
attribute_count_limit: ${OTEL_LOGRECORD_ATTRIBUTE_COUNT_LIMIT:-128} |
Its not even experimental! It doesn't exist at all yet! 😁 But yes, file configuration as a whole is still experimental. That's part of the motivation behind the "wait and see" approach of this PR. Since its experimental, we'd be free to evolve the spec if there is strong user feedback.
Not full backwards compatibility - see this list of env vars not referenced. Its true that not all users will keep it. The idea is that by providing a starter template complete with comments describing the env var behavior, and funneling users to it by referencing prominently in docs and bundling it with distributions, that it will be harder for users to misinterpret. I.e. most users should intuit that explicitly deleting a reference to |
Yeah and I'd be totally ok with "wait and see" approach if we didn't have existing specific cases where it would be problematic.
So I'm a user. I look into these 70 lines of config and keep only the things I care about, which could be something like file_format: "0.1"
disabled: ${OTEL_SDK_DISABLED}
resource:
attributes:
service.name: ${MY_SERVICE_NAME}
tracer_provider:
processors:
- batch:
exporter:
otlp:
protocol: protobuf
endpoint: http://localhost:-4318
client_key: ${OTEL_EXPORTER_OTLP_CLIENT_KEY}
meter_provider:
readers:
- periodic:
interval: 5000
exporter:
otlp:
protocol: protobuf
endpoint: http://localhost:-4318
client_key: ${OTEL_EXPORTER_OTLP_CLIENT_KEY}
default_histogram_aggregation: explicit_bucket_histogram
logger_provider:
processors:
- batch:
exporter:
otlp:
protocol: protobuf
endpoint: http://localhost
client_key: ${OTEL_EXPORTER_OTLP_CLIENT_KEY} Now I have an incident, and I realize I need to specify
Isn't it enough evidence that env var support is kinda necessary? |
Back-compat path forward I can think of is the following:
|
AFAIK yes, but users can override/specify env vars as well
Host process is doing work on behalf of worker process. E.g. worker can return a byte array, but Function is configured to store this data in an database or object store. Or worker can be triggered by queue message and host process is the one that interacts with the queue and passes the content of the message to worker. These things are called 'bindings' - they are part of user app, but done on behalf of user. Host collects telemetry from these bindings using OTel instrumentation libraries and OTel SDK
Resource attributes describe Function name (as service name), and the cloud resource. They are the same for both processes. |
I think this is the hardest limitation, even if we can overcome it in one place, there are other services, service meshes, instrumentations for web servers and whatnot, there could be others. We can't assume config is accessible or modifiable. And even if it is, I argue that softer limitations are even more severe: reading/modifying user data without explicit consent is a red flag for me and I'd have to bring it up with Microsoft legal and privacy folks to guide us if it's possible. |
Per @tedsuo’s request, we discussed this issue in the 3/24/27 TC meeting and have made a decision: Generally, we will follow @trask’s comment, proceeding with this PR with a few changes:
I've updated this PR to reflect this decision. @open-telemetry/technical-committee, @open-telemetry/configuration-maintainers, and participants in this conversation, please review. |
@jack-berg nit: the PR text still contains a reference to |
@open-telemetry/configuration-maintainers PTAL. I'll plan on merging once I get a couple of approvals from you all. |
Will merge tomorrow if there are no other comments. |
configuration model. Implementations MAY provide a mechanism to customize the | ||
configuration model parsed from `OTEL_EXPERIMENTAL_CONFIG_FILE`. | ||
|
||
Users are encouraged to use the `sdk-config.yaml` (TODO: Add link when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be otel-config.yaml
, there can be plenty of SDKs.
…v var substitution default syntax (open-telemetry#3948) Fixes open-telemetry#3752. Implementation of the configuration working group recommendation as [described here](open-telemetry#3752 (comment)). - Adds definition for `OTEL_EXPERIMENTAL_CONFIG_FILE `, which ignores other env vars when evaluating, except env var substitution references - Adds new env var substitution default syntax `${ENVVAR:-defaultValue}` - Paired with open-telemetry/opentelemetry-configuration#76, which defines a new [sdk-config.yaml](https://github.com/jack-berg/opentelemetry-configuration/blob/starter-template/examples/sdk-config.yaml) starter template referencing all env vars that map cleanly.
Fixes #3752.
Implementation of the configuration working group recommendation as described here.
OTEL_EXPERIMENTAL_CONFIG_FILE
, which ignores other env vars when evaluating, except env var substitution references${ENVVAR:-defaultValue}