Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push down filter as table partition list prefix #10693

Merged
merged 1 commit into from
May 30, 2024

Conversation

houqp
Copy link
Member

@houqp houqp commented May 28, 2024

Rationale for this change

When applicable, table partition listing can be optimized by deriving a listing prefix based on literals provided in the filter predicates.

For a selected table with many partitions, this change reduces query time from minutes to 200ms.

What changes are included in this PR?

  • stacked on top of Display date32/64 in YYYY-MM-DD format #10691. please review that PR first.
  • added a evaluate_partition_prefix function to derive a listing prefix based on filter predicates and table partitions.
  • list_partitions extended to take an optional prefix argument.

Are these changes tested?

Tested with unit tests.

Are there any user-facing changes?

No

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels May 28, 2024
@houqp houqp force-pushed the qp_upstream_partition_filter branch 2 times, most recently from 7278824 to b47e1e9 Compare May 29, 2024 04:12
@github-actions github-actions bot removed optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels May 29, 2024
@houqp houqp force-pushed the qp_upstream_partition_filter branch from b47e1e9 to 679ab29 Compare May 29, 2024 06:17
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice to me -- thanks @houqp . I have two other test suggestions but I don't think they are required

Comment on lines 784 to 787
&[Expr::and(
Expr::eq(col("a"), lit("foo")),
Expr::eq(col("b"), lit("bar")),
)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write this kind of expression more simply like col("a").eq(lit("foo")).and(col("b").eq(lit("bar"))) though maybe what you have is more readable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the eq helper improves readability, i applied that to all tests. i do agree with you that the .and helper makes the expression less readable.

// no prefix when filter is empty
assert_eq!(evaluate_partition_prefix(partitions, &[]), None);

// b=foo results in no prefix because a is not restricted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend adding some other tests:

  1. another negative test like a < 5 to cover the fact that only = predicates are allowed
  2. A test with the literal and constant swapped (foo = a)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, good suggestions.

@houqp houqp force-pushed the qp_upstream_partition_filter branch from 679ab29 to 1a29bb5 Compare May 30, 2024 08:32
@houqp houqp requested a review from alamb May 30, 2024 08:34
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@alamb alamb merged commit c775e4d into apache:main May 30, 2024
22 of 23 checks passed
@houqp houqp deleted the qp_upstream_partition_filter branch May 31, 2024 05:02
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants