-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: upgrade_catalog_perms and downgrade_catalog_perms implementation #29860
fix: upgrade_catalog_perms and downgrade_catalog_perms implementation #29860
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #29860 +/- ##
===========================================
+ Coverage 60.48% 83.69% +23.20%
===========================================
Files 1931 527 -1404
Lines 76236 38088 -38148
Branches 8568 0 -8568
===========================================
- Hits 46114 31878 -14236
+ Misses 28017 6210 -21807
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
.execution_options(synchronize_session=False) | ||
) | ||
# Commit the transaction after each batch | ||
session.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had also suggested making transactions smaller on migrations, but the limitation that I've been told is that if the second batch fails, we won't be able to roll back the first batch. Do you have an idea how we can do this so that it can roll back gracefully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, rolling back means setting the column to None
, which can be done independently of previous failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eschutho Answering your question more generically, there are some possible strategies to revert batch migrations such as logging changes to be reverted, making migrations reversible by storing the original value in a dedicated column (we do that for chart migrations) or using database checkpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, would the other "possible strategies" involve another custom script to get back to the pre-migration state? I'm just concerned that for most people just running an upgrade that fails on a batch other than the first batch would get them in a state where they could have some columns with a catalog value, and some without. @betodealmeida may be able to answer if this would be problematic, knowing how the codebase uses these values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eschutho I was able to remove the commit
statement after verifying that we could hold/rollback a transaction with an appropriate block size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok great. 🙏
I'll also need to fix this part which generates a query for each dataset, which in Airbnb's case is more than 25,000 queries.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks @michael-s-molina! I was worried about performance when I introduced the feature, but it's hard to test things at the AirBnB scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, approved too early.
You might be able to rewrite this as a single query for some dialects: UPDATE tables
SET
catalog=${default_catalog},
catalog_perm=${catalog_perm},
schema_perm=REGEXP_REPLACE(schema_perm, '\[([^\]]+)\]\.\[([^\]]+)\]', '[\1].[${default_catalog}].[\2]', 'g')
WHERE
database_id=${database_id} AND
catalog IS NULL
RETURNING id; For charts you can then use the returned IDs from the query above to build a similar |
@betodealmeida Given that the number of rows for tables and charts will hardly pass 1 million, I opted for preserving a lot of the previous logic and eliminating the part where we queried the |
@supersetbot label 4.1 |
8bb192e
to
348db6f
Compare
@betodealmeida I found another problem with how we were dealing with catalog and schema permissions that come from the security manager. Looking at the interface definition, they could be |
@betodealmeida @eschutho any other concerns about this pr? I think @michael-s-molina has addressed both your comments. |
Ah, good point! This looks correct. |
@betodealmeida Can you approve or remove the request for changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
SUMMARY
This PR fixes the implementation of
upgrade_catalog_perms
anddowngrade_catalog_perms
methods to handle tables with millions of rows. To achieve this, I changed the algorithm to execute the migrations using a batched approach invoking the update/delete commands directly using a subquery instead of changing the records one by one. It also commits the transaction on every batch to avoid keeping a long standing transaction which leads to memory and timeout issues. Finally, I added detailed logging to help system administrators to track the migration progress and prepare for downtime.Fixes #29801
TESTING INSTRUCTIONS
1 - Run any migration that uses the changed commands
2 - Check that both upgrades and downgrades work
ADDITIONAL INFORMATION