Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIP-49] OpenTelemetry Traces for Apache Airflow #37948

Merged
merged 7 commits into from
Jun 11, 2024

Conversation

howardyoo
Copy link
Contributor

@howardyoo howardyoo commented Mar 6, 2024

closes #37752


This is a PR for AIP-49 which is Open Telemetry support for Airflow. In last year, a group of contributors pushed out the first release of Airflow's commitment to OpenTelemetry by providing OTEL metrics support. This PR addresses the second phase of the OTEL implementation for Airflow, which provides emitting Traces.

@boring-cyborg boring-cyborg bot added area:dev-tools area:Executors-core LocalExecutor & SequentialExecutor area:Scheduler including HA (high availability) scheduler area:Triggerer area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:documentation labels Mar 6, 2024
@ferruzzi
Copy link
Contributor

ferruzzi commented Mar 7, 2024

This is huge, I'm going to have to take it in multiple chunks. To start with, your static checks are failing. You can run breeze static-checks --all-files locally to see what that is grumpy about.

Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First chunk of comments.

airflow/traces/otel_tracer.py Outdated Show resolved Hide resolved
airflow/config_templates/config.yml Show resolved Hide resolved
airflow/traces/otel_tracer.py Outdated Show resolved Hide resolved
airflow/traces/otel_tracer.py Outdated Show resolved Hide resolved
airflow/traces/otel_tracer.py Outdated Show resolved Hide resolved
scripts/ci/docker-compose/integration-otel.yml Outdated Show resolved Hide resolved
Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another small chunk

Overall it looks impressive. Most of my comments are around formatting and consistency so far. Great work. I'll get more done later and tomorrow.

tests/core/test_otel_tracer.py Outdated Show resolved Hide resolved
tests/core/test_otel_tracer.py Outdated Show resolved Hide resolved
airflow/traces/utils.py Outdated Show resolved Hide resolved
airflow/traces/utils.py Show resolved Hide resolved
airflow/traces/utils.py Outdated Show resolved Hide resolved
airflow/traces/utils.py Show resolved Hide resolved
airflow/traces/utils.py Outdated Show resolved Hide resolved
airflow/traces/utils.py Show resolved Hide resolved
airflow/traces/tracer.py Outdated Show resolved Hide resolved
airflow/traces/tracer.py Outdated Show resolved Hide resolved
airflow/traces/tracer.py Outdated Show resolved Hide resolved
airflow/traces/tracer.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost done with the first pass.

airflow/executors/local_executor.py Outdated Show resolved Hide resolved
airflow/jobs/job.py Outdated Show resolved Hide resolved
airflow/jobs/job.py Outdated Show resolved Hide resolved
airflow/dag_processing/manager.py Outdated Show resolved Hide resolved
airflow/executors/base_executor.py Outdated Show resolved Hide resolved
airflow/executors/base_executor.py Outdated Show resolved Hide resolved
airflow/executors/base_executor.py Outdated Show resolved Hide resolved
airflow/executors/base_executor.py Outdated Show resolved Hide resolved
airflow/jobs/scheduler_job_runner.py Outdated Show resolved Hide resolved
airflow/jobs/scheduler_job_runner.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with my first pass! LOOOTS of comments, but nothing terribly major. Great work.

airflow/traces/otel_tracer.py Outdated Show resolved Hide resolved
airflow/traces/otel_tracer.py Outdated Show resolved Hide resolved
airflow/traces/otel_tracer.py Outdated Show resolved Hide resolved
airflow/traces/otel_tracer.py Show resolved Hide resolved
airflow/traces/otel_tracer.py Show resolved Hide resolved
airflow/traces/otel_tracer.py Outdated Show resolved Hide resolved
airflow/traces/otel_tracer.py Outdated Show resolved Hide resolved
airflow/traces/otel_tracer.py Outdated Show resolved Hide resolved
airflow/traces/otel_tracer.py Outdated Show resolved Hide resolved
airflow/traces/otel_tracer.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better. Made another pass.

airflow/executors/base_executor.py Outdated Show resolved Hide resolved
airflow/executors/base_executor.py Outdated Show resolved Hide resolved
airflow/jobs/job.py Outdated Show resolved Hide resolved
airflow/jobs/job.py Outdated Show resolved Hide resolved
airflow/jobs/job.py Outdated Show resolved Hide resolved
airflow/traces/otel_tracer.py Outdated Show resolved Hide resolved
airflow/traces/otel_tracer.py Outdated Show resolved Hide resolved
airflow/traces/otel_tracer.py Show resolved Hide resolved
airflow/traces/utils.py Outdated Show resolved Hide resolved
airflow/traces/utils.py Show resolved Hide resolved
@ferruzzi
Copy link
Contributor

ferruzzi commented Mar 14, 2024

We need to get these static checks passing and it looks like it's an issue that breeze should be able to autofix. When you get time, please run breeze static-checks --all-files. It will fail but should automatically fix the issue, then run it again and see if it passes. If it fails the second time, then it's something you'll need to fix manually.

Note, each run is going to take a while... depending on your computer, something like 15 minutes, so maybe run it while you are away from your desk or in a meeting or something.

@potiuk
Copy link
Member

potiuk commented Mar 14, 2024

Actually breeze static-checks --only-my-changes should run WAY faster and do 9X% up to 100% of the job.

@potiuk potiuk force-pushed the howardyoo/otel/otel-trace-integration branch from 20f5fbf to 0b3e79a Compare May 29, 2024 15:01
Copy link
Contributor Author

@howardyoo howardyoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotit. fixingit..

@howardyoo howardyoo force-pushed the howardyoo/otel/otel-trace-integration branch from 9ac245c to 0ea4e49 Compare May 30, 2024 03:25
Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very late to the party, but I have two comments:

  • why do we need to configure the OTEL connection via Airflow configs while we can create an Airflow connection contains the credentials and the configurations in the extra and just provide its name? For me this step should be done before merging the PR as it's our recommended way for such things.
  • I see that there is an intention to support multiple tracers, but using configuration dedicated for OTEL would complicate the integration later, what we do usually is using two configs: class or module to use + kwargs to pass. In your case you need two configs: the module to use to create the tracer + a connection name as I explained in the first comment.

WDYT?

For the OTEL tracer it looks good, great job!

@howardyoo
Copy link
Contributor Author

howardyoo commented May 30, 2024

I'm very late to the party, but I have two comments:

Hey @hussein-awala, thank you for your comments. It's pretty late alright, but I do welcome comments always!

  • why do we need to configure the OTEL connection via Airflow configs while we can create an Airflow connection contains the credentials and the configurations in the extra and just provide its name? For me this step should be done before merging the PR as it's our recommended way for such things.

For the configuration schemes used for OTEL traces, I have decided to have it closely resemble our previous implementation of OTEL metrics, due to the fact that I did not want to introduce a new way to configure it (and might risk people from getting confused). However, there is another PR that I've created called 'OTEL providers for Apache Airflow', which in that case, it will use the airflow connections. The PR is still far from getting reviewed :-). In order for the OTEL integration to use the airflow connection as you mentioned, this may need to involve a new PR to perhaps modify the OTEL metrics support to also use the connection, something that I believe could be considered as an enhancement in the future.

  • I see that there is an intention to support multiple tracers, but using configuration dedicated for OTEL would complicate the integration later, what we do usually is using two configs: class or module to use + kwargs to pass. In your case you need two configs: the module to use to create the tracer + a connection name as I explained in the first comment.

I'm not sure what we do usually means, here. Can you clarify? Do you mean we as Apache Airflow, or someone else (OTEL community) ? Also, I'm not sure what using two configs: class or module to use + kwargs to pass means.

WDYT?

For the OTEL tracer it looks good, great job!

@potiuk
Copy link
Member

potiuk commented Jun 2, 2024

I'm not sure what we do usually means, here. Can you clarify? Do you mean we as Apache Airflow, or someone else (OTEL community) ? Also, I'm not sure what using two configs: class or module to use + kwargs to pass means.

The thing here is that we are mostly following the "regular" way how OTEL is onfigured (or that's what I understand - @howardyoo to confirm). When you look at OTEL documentation, many of the configuration options there are by the choice of classes and configuring them is mostly about setting the right environment variables or passing a config.yaml file - they read to configure itself. And I think we should keep it this way:

  • in Airlfow configuration we can configure enabling OTEL and some "airlfow" side of it
  • we leave the detailed configuration of specific OTEL classes to env variables / yaml files or whatever they need

This way we have greater flexibility, do not have to write our own configuration documentation.

Also see https://opentelemetry.io/docs/collector/configuration/#environment-variables

Hey @hussein-awala -> do you have the questions answered ?

@potiuk
Copy link
Member

potiuk commented Jun 8, 2024

Any comments @hussein-awala ?

@potiuk potiuk force-pushed the howardyoo/otel/otel-trace-integration branch from b1886b0 to 634fcc6 Compare June 8, 2024 20:40
@potiuk potiuk force-pushed the howardyoo/otel/otel-trace-integration branch from 634fcc6 to 2e2df87 Compare June 9, 2024 12:12
@potiuk
Copy link
Member

potiuk commented Jun 9, 2024

cc: @hussein-awala -> WDYT? I would love to merge that one now :)

@potiuk potiuk merged commit ccf1202 into apache:main Jun 11, 2024
78 checks passed
@potiuk
Copy link
Member

potiuk commented Jun 11, 2024

I had a lot of conversation about open-telemetry and traces at Berlin Buzzwords and watched some talks and I really think it is going to make problem diagnosis and resolution much easier @howardyoo @ferruzzi - let's continue with the provider and get it out in 2.10 for people to start using it.

@howardyoo
Copy link
Contributor Author

I had a lot of conversation about open-telemetry and traces at Berlin Buzzwords and watched some talks and I really think it is going to make problem diagnosis and resolution much easier @howardyoo @ferruzzi - let's continue with the provider and get it out in 2.10 for people to start using it.

I'd like to express my gratitude to @potiuk and @ferruzzi for getting this done! Thank you so much. Yes, our next mountain should be the OTEL instrumentation piece, and also the OTEL providers. We haven't made the PR for part 2, but I'll start working on it. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-49 area:dev-tools area:Executors-core LocalExecutor & SequentialExecutor area:Scheduler including HA (high availability) scheduler area:Triggerer area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:documentation type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AIP-49] Airflow support for OTEL traces
7 participants