Skip to content

Conversation

@kacpermuda
Copy link
Contributor

@kacpermuda kacpermuda commented Mar 12, 2025

In OpenLineage, we use the DefaultExtractor to extract lineage metadata from Operators that implement OL methods. To determine whether a task succeeded or failed, we rely on ti.state here.

I think we should get rid of that dependency on airflow code model. (EDIT: Missing state attr in RunTimeTaskInstance in Airflow 3 caused our extractor to fail, but even if state is reinstated to RunTimeTaskInstance, we should still proceed with this PR).

Now, depending on which listener method is called (on_task_instance_running, on_task_instance_success, on_task_instance_failed), we manually propagate the current TaskInstanceState to the OL's ExtractorManager. This change allows us to invoke appropriate Extractor method (extract_on_complete or newly added extract_on_failure). This is a similar approach to what we implemented with custom_run_facets

Until now, users using custom extractors had to manually determine whether the task has failed or completed successfully, using task_instance object. They will still be able to do that as long as the state attr is available. The other way, implemented in this PR, would be to implement this new extractor method (extract_on_failure) so that they don't have to worry about checking the state themselves.

I think adding a new method to the extractor base class is the simplest approach, ensuring we don’t break anything for users with existing custom extractors. IMHO this keeps the transition smooth while maintaining backward compatibility.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@kacpermuda kacpermuda force-pushed the fix-ol-default-extractor-af3 branch 2 times, most recently from ca55fca to 57c5337 Compare March 12, 2025 15:58
@kacpermuda
Copy link
Contributor Author

kacpermuda commented Mar 12, 2025

There is a discussion on Slack and state might get re-added to RunTimeTaskInstance. Will keep this as draft for now, until it clarifies. We should proceed with this PR regardless.

@kacpermuda kacpermuda force-pushed the fix-ol-default-extractor-af3 branch from 57c5337 to 65e79b7 Compare March 13, 2025 16:12
@kacpermuda kacpermuda marked this pull request as ready for review March 13, 2025 17:08
@kacpermuda kacpermuda requested a review from mobuchowski as a code owner March 13, 2025 17:08
@mobuchowski mobuchowski merged commit 807bdca into apache:main Mar 17, 2025
113 checks passed
@kacpermuda kacpermuda deleted the fix-ol-default-extractor-af3 branch March 18, 2025 08:46
agupta01 pushed a commit to agupta01/airflow that referenced this pull request Mar 21, 2025
nailo2c pushed a commit to nailo2c/airflow that referenced this pull request Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants