Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openlineage: improve spawned OpenLineage process in scheduler #39735

Merged

Conversation

JDarDagran
Copy link
Contributor

Although there's no call made by OpenLineage Provider to Airflow database from the process(es) spawned with ProcessPoolExecutor in Airflow, in #39520 there would be added such. Not only because of that reason there should be ORM re-configured on initialization of the processes spawned.

Additionally, ProcessPoolExecutor catches all exceptions raised in its' workers and sets information about them in the result object. However, we're not waiting or checking the status of submitted jobs, therefore exceptions are swallowed and logged anywhere.

I tried adding some tests for checking if logs land properly now from within OpenLineageListener's ProcessPoolExecutor jobs. Pytest seems not to like multiprocessing and I ended up giving up on the tests.
Both changes, however, were tested in breeze and Astro Cloud with running 1000 concurrent DagRuns and gave expected results.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link
Contributor

@kacpermuda kacpermuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good, the failed tests need some fixes. Thanks @JDarDagran - this change will help us a lot

Log exceptions that occur within ProcessPoolExecutor.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
@JDarDagran JDarDagran force-pushed the openlineage/configure-orm-in-process-pool branch from c4eacb7 to 55c1ad1 Compare May 21, 2024 10:14
@JDarDagran
Copy link
Contributor Author

Change looks good, the failed tests need some fixes. Thanks @JDarDagran - this change will help us a lot

Fixed, thanks.

@mobuchowski mobuchowski merged commit b7671ef into apache:main May 21, 2024
41 checks passed
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
…#39735)

Log exceptions that occur within ProcessPoolExecutor.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants