-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unused index idx_last_scheduling_decision
on dag_run
table
#39275
Remove unused index idx_last_scheduling_decision
on dag_run
table
#39275
Conversation
6ae0e52
to
d86e3f5
Compare
airflow/migrations/versions/0069_2_0_0_add_scheduling_decision_to_dagrun_and_.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single index alone occupies 5.7 GB
That a bit strange that this index could use such amount of storage. Maybe this happen due to disabled auto vacuum daemon or never.
In general I agree that this index unlikely will ever use in Postgres. There is couple places where last_scheduling_decision
use in sorting or filtration, but every time it use alongside with other fields so this much less effective rather than even make seq scan or use other indexes for optimise particular queries.
So removal of this one could make more benefits rather than keep it
airflow/migrations/versions/0142_2_9_1_remove_idx_last_scheduling_decision_.py
Outdated
Show resolved
Hide resolved
airflow/migrations/versions/0142_2_9_1_remove_idx_last_scheduling_decision_.py
Outdated
Show resolved
Hide resolved
airflow/migrations/versions/0069_2_0_0_add_scheduling_decision_to_dagrun_and_.py
Outdated
Show resolved
Hide resolved
bcb8a9d
to
8cd2f08
Compare
airflow/migrations/versions/0141_2_9_1_remove_idx_last_scheduling_decision_.py
Outdated
Show resolved
Hide resolved
airflow/migrations/versions/0141_2_9_1_remove_idx_last_scheduling_decision_.py
Outdated
Show resolved
Hide resolved
airflow/migrations/versions/0141_2_9_1_remove_idx_last_scheduling_decision_.py
Outdated
Show resolved
Hide resolved
@vincbeck @Taragolis @jedcunningham @utkarsharma2 addressed the review comments so far. Requesting re-review please. |
4928d9d
to
e3bca9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better! Thanks!
Bringing up a hidden in a resolved conversation comment from Jed in a conversation so that it appears here upfront From Jed The comment talks about removing this line https://github.com/apache/airflow/pull/39275/files#diff-730eff695b18ad05249293cc3361da7424aba164857e9c5d26e50bd3a78ba6e7R95 when cherry-picking back to v2-9 |
Pre-commit will still fix it when we cherry pick. Nothing to worry about |
We added
idx_last_scheduling_decision
on thelast_scheduling_decision
column in thedag_run
tablewith Airflow 2.0.0. However, this index seems to have been
unused with 0 index_scan counts throughout. I verified
scheduler performance metrics are not affected after removing
this index.
Additionally, this index seems to occupy huge storage space,
almost half the size of the table (e.g. on one deployment that
I checked, the table occupies 9.8GB of records, but this single
index alone occupies 5.7 GB)
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.