Skip to content

Conversation

@xBis7
Copy link
Contributor

@xBis7 xBis7 commented Sep 26, 2025

OTel environment variables

The OpenTelemetry SDK for metrics and traces, can be configured with two ways

  1. Adding the values directly in the code via method parameters
  2. Using the standard OTel environment variables

There are multiple OTel env variables which can be automatically picked up by the SDK if exported.

https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/

https://opentelemetry.io/docs/languages/sdk-configuration/general/

https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/

Currently, there is no need to support all of them.

Airflow is providing the code configuration based on

  • Values that have been added to custom Airflow config properties
  • Hard-coded values

How it works

On the OTel side, the environment variables are only checked if there is no code configuration. If a value is provided in the code, then that will take priority and the environment will be ignored.

Airflow has its own version of the OTel configs. It reads the Airflow-OTel properties and based on the values, it uses code to configure the OTel SDK.

If we were to export the regular environment variables and then just call the SDK methods without any parameters, the env values would be automatically used.

The OTel priorities are

  1. Code configuration
  2. OTel checks env vars

The current Airflow priorities are

  1. Airflow-OTel configs
  2. Code configuration
  3. SDK checks the env vars

For Airflow, there is always code configuration and therefore as mentioned above the OTel env vars are ignored.

Changes

This patch removes all OTel related configs from the Airflow configuration except the flags that enable OTel metrics and traces.

The values that we would get from the Airflow config, are now accessed through the following OTel environment variables.

- Common
OTEL_EXPORTER_OTLP_PROTOCOL
OTEL_EXPORTER_OTLP_ENDPOINT
OTEL_SERVICE_NAME
OTEL_RESOURCE_ATTRIBUTES
OTEL_EXPORTER_OTLP_HEADERS

- Traces specific
OTEL_TRACES_EXPORTER
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT

- Metrics specific
OTEL_METRICS_EXPORTER
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT
OTEL_METRIC_EXPORT_INTERVAL

For info regarding the values of each property, check the updated docs.

  • OTEL_TRACES_EXPORTER replaces otel_debugging_on if the value is console

  •   OTEL_EXPORTER_OTLP_ENDPOINT (common)
      OTEL_EXPORTER_OTLP_TRACES_ENDPOINT (traces)
      OTEL_EXPORTER_OTLP_METRICS_ENDPOINT (metrics)
    

    replaces

    otel_host
    otel_port
    otel_ssl_active
    
  • OTEL_METRIC_EXPORT_INTERVAL replaces otel_interval_milliseconds

Why

When you have a cluster where multiple applications are running and all of them are using OTel, then it’s common to configure the regular OTel environment variables and export them. That way, you won’t have to configure each project separately to work with your shared otel-collector service.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@xBis7
Copy link
Contributor Author

xBis7 commented Sep 26, 2025

@potiuk @ferruzzi Can you please take a look at this PR?

@jason810496 jason810496 self-requested a review September 27, 2025 09:40
Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for the PR!

Would it be better to retain the compatibility layer for Airflow-Otel environment variables and configuration, while also raising a deprecation warning? This PR introduces a breaking change for users currently relying on Otel.

log = logging.getLogger(__name__)


def _parse_kv_str_to_dict(str_var: str) -> dict[str, str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to reuse the util here ?

def parse_tracestate(tracestate_str: str | None = None) -> dict:
"""Parse tracestate string: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE."""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I would like to remove that file (traces/utils.py) entirely. The code that was using it, isn't called anymore.

Generating and assigning a trace ID and a span ID are internal operations of the OTel SDK.

The traceparent and the tracestate belong to the span context which is handled by the propagators API.

def inject(self) -> dict:
"""Inject the current span context into a carrier and return it."""
carrier: dict[str, str] = {}
TraceContextTextMapPropagator().inject(carrier)
return carrier
def extract(self, carrier: dict) -> Context:
"""Extract the span context from a provided carrier."""
return TraceContextTextMapPropagator().extract(carrier)

The inject and extract methods are taking care of the context processing.

These methods are a hacky way of handling spans, just like the Airflow config values that are getting hard-coded into the callers to the OTel SDK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually shocked we don't already have a string-to-dict helper in airflow/utils. I think we used to have one for env_var parsing, but can't find it.

@xBis7
Copy link
Contributor Author

xBis7 commented Sep 27, 2025

@jason810496 Thank you for the review!

Would it be better to retain the compatibility layer for Airflow-Otel environment variables and configuration, while also raising a deprecation warning?

Sure, I'll do that.

The code of this patch does a validation on the provided configs. I think priorities should be

  1. Airflow configs
  2. OTel env variables

If the Airflow configs are empty, then it will load the OTel env vars and do the validation. In the future, we will just remove the 1st step and it will go straight to loading the env vars.

The Airflow configs have a fallback value. As a result, they will never be empty and the env vars won't be checked. I think the default should be -. That way we can check if the user added a value or not.

What do you think?

@xBis7
Copy link
Contributor Author

xBis7 commented Sep 27, 2025

https://github.com/apache/airflow/actions/runs/18044157889/job/51351358057?pr=56150#step:6:3347

It's complaining about the 2nd upper case letter in a row but OTel is the correct spelling.

https://opentelemetry.io/docs/

I'll update it.

Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code of this patch does a validation on the provided configs. I think priorities should be

  1. Airflow configs
  2. OTel env variables

If the Airflow configs are empty, then it will load the OTel env vars and do the validation. In the future, we will just remove the 1st step and it will go straight to loading the env vars.

Yes, that is exactly what I'm thinking about. Thanks!

@xBis7
Copy link
Contributor Author

xBis7 commented Oct 2, 2025

@jason810496 Is there a standard way for marking a config as deprecated? I have found what to do with configs that will change in the future (moved or renamed) but I'm not sure about the ones that will be removed.

Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of my concerns and nitpicks have been addressed. Thanks!

Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seem this PR is depends on #56187 .
I will review again after #56187 get merged.

LGTM overall, only small nits for the docs so far. Thanks!

@xBis7 xBis7 force-pushed the support-otel-env-vars branch from 2910670 to f72c75d Compare October 16, 2025 08:57
@potiuk
Copy link
Member

potiuk commented Oct 19, 2025

conflicts :(

@xBis7
Copy link
Contributor Author

xBis7 commented Oct 20, 2025

@potiuk This PR isn't ready because it depends on #56187. Once that gets merged then this one will be adjusted accordingly.

I like the simplicity of #56634. There is the question whether to follow the same approach or not. The difference between #56634 and the current patch is that this PR is doing a validation on the configs.

@potiuk
Copy link
Member

potiuk commented Oct 21, 2025

God question. The simpler the better and I think we should not really validate something that hotel validates (or should)

@xBis7
Copy link
Contributor Author

xBis7 commented Oct 21, 2025

The simpler the better and I think we should not really validate something that hotel validates (or should)

This is true, OTel doesn't validate the configs, it just uses the values and then throws errors.

I added the validation because the SDK initialization will fail and I wanted to let users know in advance what's wrong.

@jason810496
Copy link
Member

Is there a standard way for marking a config as deprecated? I have found what to do with configs that will change in the future (moved or renamed) but I'm not sure about the ones that will be removed.

Sorry, I oversight this message.
This is the place where we deprecate the config section-option pair, but it seems only work for (new section, new option) -> (old section, old option, since_version) case.

# A mapping of (new section, new option) -> (old section, old option, since_version).
# When reading new option, the old option will be checked to see if it exists. If it does a
# DeprecationWarning will be issued and the old option will be used instead
deprecated_options: dict[tuple[str, str], tuple[str, str, str]] = {
("dag_processor", "refresh_interval"): ("scheduler", "dag_dir_list_interval", "3.0"),

I'm not sure how should we deal with removing the config pair as well.
@potiuk may we ask for what is the correct and standard way to removing the config pair?

@xBis7 xBis7 force-pushed the support-otel-env-vars branch from 356657b to b198691 Compare November 10, 2025 12:11
@potiuk potiuk force-pushed the support-otel-env-vars branch from b198691 to 4ddb14b Compare November 20, 2025 16:12
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 16, 2026
Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seem this PR is depends on #56187 .
I will review again after #56187 get merged.

Hi @xBis7
Would you still like to continue with this PR? #56187 was merged in Dec 2025. Thanks!

Is there a standard way for marking a config as deprecated? I have found what to do with configs that will change in the future (moved or renamed) but I'm not sure about the ones that will be removed.

I just traced the relative context again and found that we can leverage version_deprecated and deprecation_reason for airflow-core/src/airflow/config_templates/config.yml:

"version_deprecated": {
"type": [
"string",
"null"
],
"description": "When set to a version string, this option is deprecated as of this version, and will be removed in the future."
},
"deprecation_reason": {
"type": [
"string",
"null"
],
"description": "The reason why this option is deprecated."
},

Here is a good PR example of how to deprecate config properly:
https://github.com/apache/airflow/pull/33136/files

@jason810496 jason810496 removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 16, 2026
@xBis7
Copy link
Contributor Author

xBis7 commented Jan 16, 2026

Hi @jason810496, thank you for your continuous help on this. This PR was left behind due to other work priorities. I resumed looking into it a couple days ago.

The branch has diverged so much from main that it's painful to try to rebase it and resolve the conflicts. So, I started a new clean branch and once the changes are done, I'm going to push directly to this one, essentially replacing the branches.

I'm going to remove the validations on the environment variables. We will check the env vars first and if the values are none, then we will check the airflow config.

I had forgot about adding the deprecation warning. Thanks for reminding me!

@jason810496
Copy link
Member

The branch has diverged so much from main that it's painful to try to rebase it and resolve the conflicts. So, I started a new clean branch and once the changes are done, I'm going to push directly to this one, essentially replacing the branches.

Same though for me as well, rebasing on large conflicts is too painful.
Sure! Feel free to tag me on the new PR, thanks for the quick reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants