-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up grid_data endpoint by 10x #24284
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ it
(I'm still not thrilled at it taking 1.5s mind you, but it's a darn sight better) |
Tests were done with a dag with 2000 dummy task grouped into 100 task groups |
I guess just transferring and parsing the resulting json will take most of the time now. BTW. Are we using deflate there :D? AnNd there are quite a few tricks with Javascript VM optimization for JSON which we MIGHT take a look at :) |
There are many much much slower points to optomize on the front end before we have to worry about time spent parsing date times on the client side. |
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it.
ed13f48
to
d471f1f
Compare
I'm getting this error during auto-refresh:
|
Stack trace for that? I didn't test with running DAGs. Oops! (Good thing we can add tests now) |
|
These changes make the endpoint go from almost 20s down to 1.5s and the changes are two fold: 1. Keep datetimes as objects for as long as possible Previously we were converting start/end dates for a task group to a string, and then in the parent parsing it back to a datetime to find the min and max of all the child nodes. The fix for that was to leave it as a datetime (or a pendulum.DateTime technically) and use the existing `AirflowJsonEncoder` class to "correctly" encode these objects on output. 2. Reduce the number of DB queries from 1 per task to 1. The removed `get_task_summaries` function was called for each task, and was making a query to the database to find info for the given DagRuns. The helper function now makes just a single DB query for all tasks/runs and constructs a dict to efficiently look up the ti by run_id.
Note that this possibly has incorrect behaviour, in that the end_date of a TaskGroup is set to the max of all the children's end dates, even if some are still running. (This is the existing behaviour and is not changed or altered by this change - limiting it to just performance fixes)
d471f1f
to
b1bf967
Compare
PTAL @bbovenzi I've fixed the issue |
Looking good. But there are some linting issues. run |
Helm/Kube tests are not caused by this PR and are failing on main too. Merging. |
* Speed up grid_data endpoint by 10x These changes make the endpoint go from almost 20s down to 1.5s and the changes are two fold: 1. Keep datetimes as objects for as long as possible Previously we were converting start/end dates for a task group to a string, and then in the parent parsing it back to a datetime to find the min and max of all the child nodes. The fix for that was to leave it as a datetime (or a pendulum.DateTime technically) and use the existing `AirflowJsonEncoder` class to "correctly" encode these objects on output. 2. Reduce the number of DB queries from 1 per task to 1. The removed `get_task_summaries` function was called for each task, and was making a query to the database to find info for the given DagRuns. The helper function now makes just a single DB query for all tasks/runs and constructs a dict to efficiently look up the ti by run_id. * Add support for mapped tasks in the grid data * Don't fail when not all tasks have a finish date. Note that this possibly has incorrect behaviour, in that the end_date of a TaskGroup is set to the max of all the children's end dates, even if some are still running. (This is the existing behaviour and is not changed or altered by this change - limiting it to just performance fixes) (cherry picked from commit 451a6f4)
These changes make the endpoint go from almost 20s down to 1.5s and the
changes are two fold:
Keep datetimes as objects for as long as possible
Previously we were converting start/end dates for a task group to a
string, and then in the parent parsing it back to a datetime to find
the min and max of all the child nodes.
The fix for that was to leave it as a datetime (or a
pendulum.DateTime technically) and use the existing
AirflowJsonEncoder
class to "correctly" encode these objects onoutput.
Reduce the number of DB queries from 1 per task to 1.
The removed
get_task_summaries
function was called for each task,and was making a query to the database to find info for the given
DagRuns.
The helper function now makes just a single DB query for all
tasks/runs and constructs a dict to efficiently look up the ti by
run_id.