Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openlineage: Airflow custom facets needs slots #40971

Closed
2 tasks done
JDarDagran opened this issue Jul 23, 2024 · 0 comments
Closed
2 tasks done

openlineage: Airflow custom facets needs slots #40971

JDarDagran opened this issue Jul 23, 2024 · 0 comments
Assignees
Labels

Comments

@JDarDagran
Copy link
Contributor

Apache Airflow Provider(s)

openlineage

Versions of Apache Airflow Providers

No response

Apache Airflow version

main branch

Operating System

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/"

Deployment

Other

Deployment details

No response

What happened

In #39530 there was migration to v2 facets of OpenLineage client done. v2 introduces quite a lot of improvements but seems like it also brought some unseen result when changing from @attr.s to @attrs.define -> default value of slots were changed from False to True.
Given PR changes parent classes of e.g. AirflowJobFacet to v2 version but it still has slots set to False. This leads into unwanted behaviour when pickling instances of the class. Given the example of AirflowJobFacet __slots__ contain only _deleted (coming from parent class), therefore pickling fails on the attributes of the child class.

Below example illustrates it well:

In [1]: import pickle, attrs

In [2]: @attrs.define(slots=False)
   ...: class A():
   ...:     a: str
   ...: 

In [3]: @attrs.define(slots=True)
   ...: class B():
   ...:     b: str
   ...: 

In [4]: @attrs.define(slots=False)
   ...: class C(A):
   ...:     c: str
   ...: 

In [5]: @attrs.define(slots=False)
   ...: class D(A):
   ...:     d: str
   ...: 

In [6]: @attrs.define(slots=True)
   ...: class E(B):
   ...:     e: str
   ...: 

In [7]: @attrs.define(slots=False)
   ...: class F(B):
   ...:     f: str
   ...:

In [8]: def test(klazz):
    ...:     try:
    ...:         instance = pickle.loads(pickle.dumps(klazz(**{a.name: a.name for a in attrs.fields(klazz)})))
    ...:         for field in attrs.fields(klazz):
    ...:             getattr(instance, field.name)
    ...:     except AttributeError:
    ...:         print(f"{klazz} failed to unpickle")
    ...:

In [9]: test(A), test(B), test(C), test(D), test(E), test(F)
<class '__main__.F'> failed to unpickle

This wasn't caught with unit tests as it is revealed only when using ProcessPoolExecutor from within OpenLineageListener. When dealing with objects between processes Python pickles them.

What you think should happen instead

For two reasons:

  1. not to migrate to another set of facets in OL client that change slots from True to False
  2. keeping slots in case of facets does not seem to have huge impact on performance

I suggest we simply change slots argument to True for all facets used in dag run state listener hooks.

How to reproduce

Run breeze with --integration openlineage and OL provider installed from wheel. Run example DAG and check scheduler logs for error indicating pickling failure.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@JDarDagran JDarDagran added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jul 23, 2024
@dosubot dosubot bot added the provider:openlineage AIP-53 label Jul 23, 2024
@shahar1 shahar1 removed the needs-triage label for new issues that we didn't triage yet label Jul 26, 2024
@shahar1 shahar1 closed this as completed Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants