Added config option to enable parallel deletes for vacuum command #522
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #395
#416 hasn't been updated in over four months, and it would be a verify useful feature for us to have, so I took my own stab at it.
vacuum.parallelDelete.enabled
that defaults to falsevacuum.parallelDelete.enabled
is set to true, it maintains the existing partitions from thediff
calculation. Because this is the result of ajoin
, your partitions are then based off yourspark.sql.shuffle.partitions
. So your parallelism will be min(number of executors, shuffle partitions), and you can tweak your shuffle partitions if you want more/less parallelismI removed the delete static method because the number of parameters that had to be passed to it made it seem like too much. Happy to move that back if that's not preferred.
Also happy to make any updates to the name or description of the new config.