-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add indexes on dag_id column in referencing tables to speed up deletion of dag records #39638
Merged
ephraimbuddy
merged 8 commits into
apache:main
from
astronomer:idx-optimise-slow-deletion-of-dags-1
May 17, 2024
Merged
Add indexes on dag_id column in referencing tables to speed up deletion of dag records #39638
ephraimbuddy
merged 8 commits into
apache:main
from
astronomer:idx-optimise-slow-deletion-of-dags-1
May 17, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
boring-cyborg
bot
added
area:db-migrations
PRs with DB migration
kind:documentation
labels
May 15, 2024
pankajkoti
changed the title
Add indexes on dag_id column in refencing tables to speed up deletion…
Add indexes on dag_id column in refencing tables to speed up deletion of dag records
May 15, 2024
pankajkoti
force-pushed
the
idx-optimise-slow-deletion-of-dags-1
branch
from
May 15, 2024 11:39
26106be
to
fb83faf
Compare
airflow/migrations/versions/0143_2_9_2_add_indexes_on_dag_id_column_in_referencing_tables.py
Outdated
Show resolved
Hide resolved
… the behavior in CI
pankajkoti
commented
May 15, 2024
airflow/migrations/versions/0143_2_9_2_add_indexes_on_dag_id_column_in_referencing_tables.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
airflow/migrations/versions/0143_2_9_2_add_indexes_on_dag_id_column_in_referencing_tables.py
Outdated
Show resolved
Hide resolved
airflow/migrations/versions/0143_2_9_2_add_indexes_on_dag_id_column_in_referencing_tables.py
Outdated
Show resolved
Hide resolved
pankajkoti
requested review from
ephraimbuddy,
Taragolis,
jedcunningham and
dstandish
May 16, 2024 14:43
pankajkoti
changed the title
Add indexes on dag_id column in refencing tables to speed up deletion of dag records
Add indexes on dag_id column in referencing tables to speed up deletion of dag records
May 16, 2024
ephraimbuddy
approved these changes
May 17, 2024
pankajkoti
added a commit
to astronomer/airflow
that referenced
this pull request
Jun 5, 2024
pankajkoti
added a commit
that referenced
this pull request
Jun 5, 2024
Set PR #39638 to be marked for 2.10 as we would like to consider it as an improvement as suggested by @ephraimbuddy
fdemiane
pushed a commit
to fdemiane/airflow
that referenced
this pull request
Jun 6, 2024
Set PR apache#39638 to be marked for 2.10 as we would like to consider it as an improvement as suggested by @ephraimbuddy
romsharon98
pushed a commit
to romsharon98/airflow
that referenced
this pull request
Jul 26, 2024
…on of dag records (apache#39638) * Add indexes on dag_id column in refencing tables to speed up deletion of dag records * Gracefully handle deletion of indexes on foreign key columns during downgrade * Correct constraint key name for dag_owner_attributes table fk * Handle ForeignKey for dag_owner_attributes table behavior based on db * Temporarily disable downgrade for dag_owner_attributes table to check the behavior in CI * Skip index for dag_owner_attributes table * Address @ephraimbuddy's comment
romsharon98
pushed a commit
to romsharon98/airflow
that referenced
this pull request
Jul 26, 2024
Set PR apache#39638 to be marked for 2.10 as we would like to consider it as an improvement as suggested by @ephraimbuddy
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:db-migrations
PRs with DB migration
kind:documentation
type:improvement
Changelog: Improvements
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When the dag records count gets huge, and users try to delete
DAG and DAG runs that are no longer needed or are stale, it
is observed that the deletion is significantly slow. The reason
for this is that the CASCADING DELETES are slow. Although,
we have foreign key constraints in the referencing tables, they
do not create an index implicitly on those columns (dag_id in
the referencing tables in this case). Hence, we're creating indexes
on the 5 of the 6 referencing tables for CASCADE DELETES to
speed up the deletion of records. In this PR, we're skipping to
add the index on the 6th table
dag_owner_attributes
as we'refacing a failure in the CI to find the constraint
dag.dag_id
forthat table. I plan to follow-up on the remaining 6th table it in a
separate PR. Without these indexes, it was observed that it takes
many hours to delete those records and it reduced to a few seconds
after adding those indexes.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.