Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL planner support for Like, ILike and SimilarTo, with optional escape character #3101

Merged
merged 12 commits into from
Sep 9, 2022

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Aug 10, 2022

Which issue does this PR close?

Closes #3099

Rationale for this change

The Dask SQL project would like to support queries using LIKE, ILIKE,SIMILAR TO syntax in Postgres, with an optional escape character specified, so we want to add this support to the SQL query planner and the logical plan.

What changes are included in this PR?

We originally modeled LIKE and NOT LIKE as binary operations but this cannot support the optional escape character. For that reason, this PR adds new top-level expressions, but the old ones remain for backwards compatibility.

There is no support in the physical planner yet for supporting the optional escape character.

Are there any user-facing changes?

Yes. This is an API change.

@andygrove andygrove added the api change Changes the API exposed to users of the crate label Aug 10, 2022
@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner labels Aug 10, 2022
@andygrove andygrove marked this pull request as ready for review August 24, 2022 19:38
@andygrove andygrove changed the title [WIP] Make Like a top-level Expr and add SQL support for ILike and SimilarTo Make Like a top-level Expr and add SQL support for ILike and SimilarTo Aug 24, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 24, 2022

Codecov Report

Merging #3101 (a18dec8) into master (e6378f4) will increase coverage by 0.06%.
The diff coverage is 60.56%.

@@            Coverage Diff             @@
##           master    #3101      +/-   ##
==========================================
+ Coverage   85.58%   85.64%   +0.06%     
==========================================
  Files         296      296              
  Lines       54252    54362     +110     
==========================================
+ Hits        46432    46561     +129     
+ Misses       7820     7801      -19     
Impacted Files Coverage Δ
datafusion/sql/src/planner.rs 81.06% <18.75%> (-0.62%) ⬇️
datafusion/physical-expr/src/planner.rs 93.29% <77.77%> (-0.91%) ⬇️
datafusion/proto/src/lib.rs 93.85% <100.00%> (+0.33%) ⬆️
benchmarks/src/bin/tpch.rs 37.59% <0.00%> (-3.56%) ⬇️
datafusion/optimizer/src/optimizer.rs 90.90% <0.00%> (-1.40%) ⬇️
datafusion/optimizer/src/reduce_outer_join.rs 98.19% <0.00%> (-0.61%) ⬇️
datafusion/proto/src/logical_plan.rs 17.46% <0.00%> (-0.24%) ⬇️
datafusion/common/src/scalar.rs 85.06% <0.00%> (-0.07%) ⬇️
datafusion/proto/src/bytes/mod.rs 82.75% <0.00%> (ø)
... and 12 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@andygrove andygrove marked this pull request as draft August 24, 2022 20:22
@andygrove andygrove marked this pull request as ready for review August 24, 2022 20:52
@andygrove andygrove marked this pull request as draft August 25, 2022 17:39
@andygrove andygrove marked this pull request as ready for review September 6, 2022 17:58
@andygrove andygrove requested a review from alamb September 6, 2022 18:05
@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Sep 6, 2022
@andygrove andygrove changed the title Make Like a top-level Expr and add SQL support for ILike and SimilarTo Add SQL planner support for Like, ILike and SimilarTo, with optional escape character Sep 8, 2022
@andygrove
Copy link
Member Author

@ayushdg PTAL when you can

Expr::Like { pattern, .. }
| Expr::ILike { pattern, .. }
| Expr::SimilarTo { pattern, .. } => match pattern.get_type(&self.schema)? {
DataType::Utf8 => Ok(expr.clone()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about LargeUtf8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the regular expression pattern, which is typically a literal string but could, in theory, be a column reference. I'm not sure if it would make sense to support LargeUtf8 here?

}
Ok(Expr::Like {
negated,
expr: Box::new(self.sql_expr_to_logical_expr(*expr, schema, ctes)?),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

})
}
Expr::Like { pattern, .. }
| Expr::ILike { pattern, .. }
| Expr::SimilarTo { pattern, .. } => match pattern.get_type(&self.schema)? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the wrong place to be type checking the argument to Expr::Like, etc. as the argument types to other exprs are checked so why would we treat Expr::Like differently?

I would expect that to be done in the SQL planner perhaps? or in the physical_expr conversion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't check the type of the pattern expression until we have a schema to resolve against (it could be a reference to a column or any other type of expression).

My goal was to do the validation in the logical plan for the benefit of other engines building on DataFusion that don't use the physical plan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, we do have the schema in SQL planning ... I will make that change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Ok, this PR just got a whole lot smaller. Maybe this really shouldn't have taken me 30 days to do 🤣

@alamb
Copy link
Contributor

alamb commented Sep 8, 2022

This is pretty neat, by the way 👍

@github-actions github-actions bot removed the core Core DataFusion crate label Sep 9, 2022
@github-actions github-actions bot removed the optimizer Optimizer rules label Sep 9, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me 😍

@alamb Ok, this PR just got a whole lot smaller. Maybe this really shouldn't have taken me 30 days to do 🤣

Well, I think it had a bunch of dependencies that are all now sorted out, so the final PR "makes it look easy" even though a bunch of effort was needed

}
let pattern = self.sql_expr_to_logical_expr(*pattern, schema, ctes)?;
let pattern_type = pattern.get_type(schema)?;
if pattern_type != DataType::Utf8 && pattern_type != DataType::Null {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit c96f03e into apache:master Sep 9, 2022
@ursabot
Copy link

ursabot commented Sep 9, 2022

Benchmark runs are scheduled for baseline = 73447b5 and contender = c96f03e. c96f03e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Dec 1, 2022
…ional escape character (apache#3101)

* Make Like a top-level Expr

* revert some changes

* add type validation

* Revert physical plan changes and reduce scope of the PR

* Revert more changes

* Revert more changes

* clippy

* address feedback

* revert change to test

* revert more changes
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Dec 1, 2022
…ional escape character (apache#3101)

* Make Like a top-level Expr

* revert some changes

* add type validation

* Revert physical plan changes and reduce scope of the PR

* Revert more changes

* Revert more changes

* clippy

* address feedback

* revert change to test

* revert more changes
@andygrove andygrove deleted the like-expr branch January 27, 2023 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate physical-expr Physical Expressions sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Postgres SIMILAR TO and ILIKE syntax
6 participants