-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dag File Processing Slowness when using Dag Params #32434
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
@dlstadther-pl I'll take a look and try to reproduce the problem. |
See also a previous report in the past regarding params and dag processing slowdown. Can you please share the jsonschema version? Ref : #28445 |
@tirkarthi I have reproduced the issue using |
@hussein-awala As per the jsonschema project it seems they did a major rewrite that fixes the performance issue. The fix pr was merged and 4.18.0 was released 2 days back. Can you please try to see if 4.18.0 fixes the issue? python-jsonschema/jsonschema#941 |
I just tested it, the parsing time has been reduced to approximately 2-3 seconds. While this is an improvement over jsonschema=4.17.3, fastjsonschema remains the faster option. wdyt? |
Since jsonschema is already used by Airflow and this version bump is also promised to be backwards compatible by upstream I tend to just bump jsonschema to see if CI passes since it seems better fix in terms of compatibility and also improves the situation without any code changes and less risky. I don't see fastjsonschema as promised to be compatible with jsonschema in their docs. Maybe fastjsonschema could be a different issue since it will involve code changes and testing the API with existing code. https://pypistats.org/packages/jsonschema |
Thank you @hussein-awala and @tirkarthi for the quick engagement on this issue! I have forced my local Airflow 2.6.2 image to include Would the rest of the delta (between 2.2.3's 50s and the now 350s) be explained by additional validation checks against dag params? Or could there be another performance issue at play? |
@dlstadther-pl I'm trying to switch from using |
After forcing the upgrade of Glad it was a simple as incrementing a dependency version. Thanks again @hussein-awala ! I'm good to close this issue as resolved now that your PR is merged. |
Apache Airflow version
2.6.2
What happened
After migrating from Airflow 2.2.3 to 2.6.2, we saw a large (~5-10x) increase in DAG File Processing time for our dags. While we have some anti-patterns with dag generation (dynamic dag generation and usage of 5 Airflow Variables), we have isolated the increase in processing duration to the existence of Dag Params (see "How to Reproduce", below).
We're experiencing this issue in our most complex dag file. This dag file creates 1 "main" dag which runs a TriggerDagRunOperator on each "client-specific" dags for which it generates dynamically. Each client-specific dag is assigned 5 Dag Params (which describe certain characteristics of the client) and about 400 tasks.
Dag files which used to take 0.58s now take 2.88s; 3s now take 30s; 95s now take 985s.
What you think should happen instead
I believe DAG Processing is inefficient at serializing dag params during the serialization of tasks.
(However, I have been unable to pinpoint a commit which caused a significant change to the serialization of
DagBag.sync_to_db()
code).How to reproduce
I have reproduced the situation we experience locally with a representative (but dumb) dag example which can show that dag file processing runtimes increase as the quantity of Dag Params increase.
I realize this dag may be a bit complex and so I've also included the visual representation of how the dags relate to each other and many of the Dag File Processing times for 2.2.3 and 2.6.2 when using various quantity of Dag Params in the client-specific dag definitions.
Code
Creates:
Runtimes
client_qty = 1
client_qty = 10
client_qty = 150
Operating System
Debian 11 (bullseye)
Versions of Apache Airflow Providers
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
Anything else
I've also recreated the same issue (with the sample code provided above) using the Airflow Docker Compose setup.
Our issue differs from #30593 and #30884 , as we are already on 2.6.x and use the default value (5s) for
job_heartbeat_sec
.Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: