fix: Adjust OpenLineage DefaultExtractor for RuntimeTaskInstance in Airflow 3 #47673

kacpermuda · 2025-03-12T13:11:36Z

In OpenLineage, we use the DefaultExtractor to extract lineage metadata from Operators that implement OL methods. To determine whether a task succeeded or failed, we rely on ti.state here.

I think we should get rid of that dependency on airflow code model. (EDIT: Missing state attr in RunTimeTaskInstance in Airflow 3 caused our extractor to fail, but even if state is reinstated to RunTimeTaskInstance, we should still proceed with this PR).

Now, depending on which listener method is called (on_task_instance_running, on_task_instance_success, on_task_instance_failed), we manually propagate the current TaskInstanceState to the OL's ExtractorManager. This change allows us to invoke appropriate Extractor method (extract_on_complete or newly added extract_on_failure). This is a similar approach to what we implemented with custom_run_facets

Until now, users using custom extractors had to manually determine whether the task has failed or completed successfully, using task_instance object. They will still be able to do that as long as the state attr is available. The other way, implemented in this PR, would be to implement this new extractor method (extract_on_failure) so that they don't have to worry about checking the state themselves.

I think adding a new method to the extractor base class is the simplest approach, ensuring we don’t break anything for users with existing custom extractors. IMHO this keeps the transition smooth while maintaining backward compatibility.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

kacpermuda · 2025-03-12T16:08:26Z

There is a discussion on Slack and state might get re-added to RunTimeTaskInstance. ~~Will keep this as draft for now, until it clarifies.~~ We should proceed with this PR regardless.

…irflow 3

…irflow 3 (apache#47673)

boring-cyborg bot added area:providers provider:openlineage AIP-53 labels Mar 12, 2025

kacpermuda force-pushed the fix-ol-default-extractor-af3 branch 2 times, most recently from ca55fca to 57c5337 Compare March 12, 2025 15:58

fix: Adjust OpenLineage DefaultExtractor for RuntimeTaskInstance in A…

65e79b7

…irflow 3

kacpermuda force-pushed the fix-ol-default-extractor-af3 branch from 57c5337 to 65e79b7 Compare March 13, 2025 16:12

kacpermuda marked this pull request as ready for review March 13, 2025 17:08

kacpermuda requested a review from mobuchowski as a code owner March 13, 2025 17:08

mobuchowski approved these changes Mar 17, 2025

View reviewed changes

mobuchowski merged commit 807bdca into apache:main Mar 17, 2025
113 checks passed

kacpermuda deleted the fix-ol-default-extractor-af3 branch March 18, 2025 08:46

agupta01 pushed a commit to agupta01/airflow that referenced this pull request Mar 21, 2025

fix: Adjust OpenLineage DefaultExtractor for RuntimeTaskInstance in A…

3961179

…irflow 3 (apache#47673)

eladkal mentioned this pull request Mar 26, 2025

Status of testing Providers that were prepared on March 26, 2025 #48395

Closed

This was referenced Mar 27, 2025

fix: OpenLineage BaseExtractor's on_failure should call on_complete by default #48456

Merged

docs: Update OL docs after BaseExtractor changes #48585

Merged

nailo2c pushed a commit to nailo2c/airflow that referenced this pull request Apr 4, 2025

fix: Adjust OpenLineage DefaultExtractor for RuntimeTaskInstance in A…

494df86

…irflow 3 (apache#47673)

eladkal mentioned this pull request Apr 6, 2025

Status of testing Providers that were prepared on April 06, 2025 #48842

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Adjust OpenLineage DefaultExtractor for RuntimeTaskInstance in Airflow 3 #47673

fix: Adjust OpenLineage DefaultExtractor for RuntimeTaskInstance in Airflow 3 #47673

Uh oh!

kacpermuda commented Mar 12, 2025 •

edited

Loading

Uh oh!

kacpermuda commented Mar 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Adjust OpenLineage DefaultExtractor for RuntimeTaskInstance in Airflow 3 #47673

fix: Adjust OpenLineage DefaultExtractor for RuntimeTaskInstance in Airflow 3 #47673

Uh oh!

Conversation

kacpermuda commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kacpermuda commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kacpermuda commented Mar 12, 2025 •

edited

Loading

kacpermuda commented Mar 12, 2025 •

edited

Loading