-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE] Support In list option contains non-foldable expression #4843
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. What is the behavior for query like below before this PR, fallback or remaining filter being used?
SELECT * FROM t WHERE c in (1, 2, c2)
@rui-mo it would fallback for both scan and filter |
if (i.list.exists(!_.foldable)) { | ||
throw new UnsupportedOperationException( | ||
s"In list option does not support non-foldable expression, ${i.list.map(_.sql)}") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zzcclp Is non-foldable expression not supported by CH backend either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your good work!
Let's wait for CH developers to verify. cc @baibaichen
override def apply(plan: SparkPlan): SparkPlan = { | ||
plan match { | ||
// TODO: Support datasource v2 | ||
case scan: FileSourceScanExec if scan.dataFilters.exists(_.find(shouldRewrite).isDefined) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, we can make a rule to only focus on handling spark expression instead of Spark plan if the rewriting should always take effect regardless of what Spark plan (scan, filter, etc.) the expressions come from.
If feasible, we can do it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, if we are sure it's safe, we can call transformExpressions instead
@@ -1086,4 +1086,47 @@ class TestOperator extends VeloxWholeStageTransformerSuite with AdaptiveSparkPla | |||
} | |||
} | |||
|
|||
test("Support In list option contains non-foldable expression") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this similar ut for the ch backend ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to add test for ch backend, let's see if the tests can pass..
Run Gluten Clickhouse CI |
@@ -137,4 +140,59 @@ class GlutenClickhouseFunctionSuite extends GlutenClickHouseTPCHAbstractSuite { | |||
assert(diffCount == 0) | |||
} | |||
} | |||
|
|||
private def checkFallbackOperators(df: DataFrame, num: Int): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to put this tool function into WholeStageTransformerSuite ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
LGTM |
Run Gluten Clickhouse CI |
@@ -51,7 +50,7 @@ class TestOperator extends VeloxWholeStageTransformerSuite with AdaptiveSparkPla | |||
.set("spark.memory.offHeap.size", "2g") | |||
.set("spark.unsafe.exceptionOnMemoryLeak", "true") | |||
.set("spark.sql.autoBroadcastJoinThreshold", "-1") | |||
.set("spark.sql.sources.useV1SourceList", "avro") | |||
.set("spark.sql.sources.useV1SourceList", "avro,parquet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @PHILO-HE I'm not sure why are we using v2 to read parquet. I changed to use v1 if it does not break your original idea.
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
…e#4843) * Support In list option contains non-foldable expression * address comment --------- Co-authored-by: Kent Yao <yao@apache.org>
…e#4843) * Support In list option contains non-foldable expression * address comment --------- Co-authored-by: Kent Yao <yao@apache.org>
…e#4843) * Support In list option contains non-foldable expression * address comment --------- Co-authored-by: Kent Yao <yao@apache.org>
What changes were proposed in this pull request?
This pr adds a rule to rewrite
In
if the list option contain non-foldable value. A rewrite example:How was this patch tested?
add test