Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] Support In list option contains non-foldable expression #4843

Merged
merged 3 commits into from
Mar 6, 2024

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Mar 4, 2024

What changes were proposed in this pull request?

This pr adds a rule to rewrite In if the list option contain non-foldable value. A rewrite example:

SELECT * FROM t WHERE c in (1, 2, c2)
=>
SELECT * FROM t WHERE c in (1, 2) or c = c2

How was this patch tested?

add test

Copy link

github-actions bot commented Mar 4, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Mar 4, 2024

Run Gluten Clickhouse CI

@ulysses-you
Copy link
Contributor Author

cc @rui-mo @PHILO-HE thank you

Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. What is the behavior for query like below before this PR, fallback or remaining filter being used?

SELECT * FROM t WHERE c in (1, 2, c2)

@ulysses-you
Copy link
Contributor Author

@rui-mo it would fallback for both scan and filter

@rui-mo rui-mo requested a review from zzcclp March 5, 2024 01:45
rui-mo
rui-mo previously approved these changes Mar 5, 2024
if (i.list.exists(!_.foldable)) {
throw new UnsupportedOperationException(
s"In list option does not support non-foldable expression, ${i.list.map(_.sql)}")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zzcclp Is non-foldable expression not supported by CH backend either?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, not supported.

PHILO-HE
PHILO-HE previously approved these changes Mar 5, 2024
Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your good work!
Let's wait for CH developers to verify. cc @baibaichen

override def apply(plan: SparkPlan): SparkPlan = {
plan match {
// TODO: Support datasource v2
case scan: FileSourceScanExec if scan.dataFilters.exists(_.find(shouldRewrite).isDefined) =>
Copy link
Contributor

@PHILO-HE PHILO-HE Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, we can make a rule to only focus on handling spark expression instead of Spark plan if the rewriting should always take effect regardless of what Spark plan (scan, filter, etc.) the expressions come from.
If feasible, we can do it in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, if we are sure it's safe, we can call transformExpressions instead

@@ -1086,4 +1086,47 @@ class TestOperator extends VeloxWholeStageTransformerSuite with AdaptiveSparkPla
}
}

test("Support In list option contains non-foldable expression") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this similar ut for the ch backend ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to add test for ch backend, let's see if the tests can pass..

Copy link

github-actions bot commented Mar 5, 2024

Run Gluten Clickhouse CI

@@ -137,4 +140,59 @@ class GlutenClickhouseFunctionSuite extends GlutenClickHouseTPCHAbstractSuite {
assert(diffCount == 0)
}
}

private def checkFallbackOperators(df: DataFrame, num: Int): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to put this tool function into WholeStageTransformerSuite ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

@zzcclp
Copy link
Contributor

zzcclp commented Mar 5, 2024

LGTM

Copy link

github-actions bot commented Mar 6, 2024

Run Gluten Clickhouse CI

@@ -51,7 +50,7 @@ class TestOperator extends VeloxWholeStageTransformerSuite with AdaptiveSparkPla
.set("spark.memory.offHeap.size", "2g")
.set("spark.unsafe.exceptionOnMemoryLeak", "true")
.set("spark.sql.autoBroadcastJoinThreshold", "-1")
.set("spark.sql.sources.useV1SourceList", "avro")
.set("spark.sql.sources.useV1SourceList", "avro,parquet")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @PHILO-HE I'm not sure why are we using v2 to read parquet. I changed to use v1 if it does not break your original idea.

@ulysses-you
Copy link
Contributor Author

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Mar 6, 2024

Run Gluten Clickhouse CI

@yaooqinn yaooqinn merged commit e9fdd6e into apache:main Mar 6, 2024
18 of 19 checks passed
Copy link

github-actions bot commented Mar 6, 2024

Run Gluten Clickhouse CI

@ulysses-you ulysses-you deleted the fix branch March 6, 2024 06:27
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4843_time.csv log/native_master_03_06_2024_10910e900_time.csv difference percentage
q1 36.55 35.69 -0.857 97.65%
q2 24.52 24.09 -0.428 98.26%
q3 39.05 37.17 -1.878 95.19%
q4 37.87 37.31 -0.561 98.52%
q5 69.63 70.29 0.668 100.96%
q6 7.41 7.07 -0.338 95.43%
q7 84.26 84.52 0.263 100.31%
q8 85.53 86.24 0.708 100.83%
q9 124.72 125.77 1.050 100.84%
q10 44.11 42.91 -1.203 97.27%
q11 20.34 20.70 0.363 101.78%
q12 25.22 29.30 4.082 116.19%
q13 45.72 45.63 -0.094 99.79%
q14 16.31 20.55 4.241 126.00%
q15 28.16 29.07 0.912 103.24%
q16 14.42 13.92 -0.497 96.55%
q17 102.71 102.68 -0.032 99.97%
q18 143.41 145.05 1.632 101.14%
q19 16.23 12.61 -3.617 77.71%
q20 27.04 25.67 -1.367 94.95%
q21 223.79 224.29 0.501 100.22%
q22 13.73 13.74 0.019 100.14%
total 1230.71 1234.28 3.566 100.29%

taiyang-li pushed a commit to bigo-sg/gluten that referenced this pull request Mar 25, 2024
…e#4843)

* Support In list option contains non-foldable expression

* address comment

---------

Co-authored-by: Kent Yao <yao@apache.org>
taiyang-li pushed a commit to bigo-sg/gluten that referenced this pull request Oct 8, 2024
…e#4843)

* Support In list option contains non-foldable expression

* address comment

---------

Co-authored-by: Kent Yao <yao@apache.org>
taiyang-li pushed a commit to bigo-sg/gluten that referenced this pull request Oct 9, 2024
…e#4843)

* Support In list option contains non-foldable expression

* address comment

---------

Co-authored-by: Kent Yao <yao@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants