-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] Support predicate pushdown in scans with DVs #2982
Merged
scottsand-db
merged 18 commits into
delta-io:branch-3.2
from
andreaschat-db:supportPredicatePushdownInScansWithDVs-3.2
Apr 26, 2024
Merged
[Spark] Support predicate pushdown in scans with DVs #2982
scottsand-db
merged 18 commits into
delta-io:branch-3.2
from
andreaschat-db:supportPredicatePushdownInScansWithDVs-3.2
Apr 26, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This is the 1st commit message: flush # This is the commit message delta-io#2: flush # This is the commit message delta-io#3: First sane version without isRowDeleted # This is the commit message delta-io#4: Hack RowIndexMarkingFilters # This is the commit message delta-io#5: Add support for non-vectorized readers # This is the commit message delta-io#6: Metadata column fix
# This is the 1st commit message: flush # This is the commit message delta-io#2: flush # This is the commit message delta-io#3: First sane version without isRowDeleted # This is the commit message delta-io#4: Hack RowIndexMarkingFilters # This is the commit message delta-io#5: Add support for non-vectorized readers # This is the commit message delta-io#6: Metadata column fix # This is the commit message delta-io#7: Avoid non-deterministic UDF to filter deleted rows # This is the commit message delta-io#8: metadata with Expression ID # This is the commit message delta-io#9: Fix complex views issue # This is the commit message delta-io#10: Tests # This is the commit message delta-io#11: cleaning # This is the commit message delta-io#12: More tests and fixes
flush First sane version without isRowDeleted Hack RowIndexMarkingFilters Add support for non-vectorized readers Metadata column fix Avoid non-deterministic UDF to filter deleted rows metadata with Expression ID Fix complex views issue Tests cleaning More tests and fixes Partial cleaning
# This is the 1st commit message: flush # This is the commit message delta-io#2: flush # This is the commit message delta-io#3: First sane version without isRowDeleted # This is the commit message delta-io#4: Hack RowIndexMarkingFilters # This is the commit message delta-io#5: Add support for non-vectorized readers # This is the commit message delta-io#6: Metadata column fix # This is the commit message delta-io#7: Avoid non-deterministic UDF to filter deleted rows # This is the commit message delta-io#8: metadata with Expression ID # This is the commit message delta-io#9: Fix complex views issue # This is the commit message delta-io#10: Tests # This is the commit message delta-io#11: cleaning # This is the commit message delta-io#12: More tests and fixes # This is the commit message delta-io#13: Partial cleaning # This is the commit message delta-io#14: cleaning and improvements # This is the commit message delta-io#15: cleaning and improvements # This is the commit message delta-io#16: Clean RowIndexFilter
flush First sane version without isRowDeleted Hack RowIndexMarkingFilters Add support for non-vectorized readers Metadata column fix Avoid non-deterministic UDF to filter deleted rows metadata with Expression ID Fix complex views issue Tests cleaning More tests and fixes Partial cleaning cleaning and improvements cleaning and improvements Clean RowIndexFilter Clean DeltaParquetFileFormat Improve DeletionVectorsSuite Disable DeltaParquetFileFormatSuite for predicate pushdown.
scottsand-db
approved these changes
Apr 26, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which Delta project/connector is this regarding?
Description
Currently, when Deletion Vectors are enabled we disable predicate pushdown and splitting in scans. This is because we rely on a custom row index column which is constructed in the executors and cannot not handle splits and predicates. These restrictions can now be lifted by relying instead on
metadata.row_index
which was exposed recently after relevant work was concluded.Overall, this PR adds predicate pushdown and splits support as follows:
__delta_internal_is_row_deleted
with_metadata.row_index
.__delta_internal_is_row_deleted
that is based on_metadata.row_index
.IsRowDeleted
filter is now non deterministic to allow predicate pushdown.Furthermore, it includes previous relevant work to remove the UDF from
IsRowDeleted
filter.How was this patch tested?
Added new suites.
Does this PR introduce any user-facing changes?
No.