-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](nereids)EliminateGroupByConstant should replace agg's output after removing constant group by keys #32878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ter removing constant group by keys
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 38138 ms |
TPC-DS: Total hot run time: 181931 ms |
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G' |
morrySnow
approved these changes
Mar 27, 2024
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
yiguolei
approved these changes
Mar 27, 2024
Jibing-Li
added a commit
that referenced
this pull request
Mar 29, 2024
* [fix](merge cloud) Fix cloud be set be tag map (#32864) * [chore] Add gavinchou to collaborators (#32881) * [chore](show) support statement to show views from table (#32358) MySQL [test]> show views; +----------------+ | Tables_in_test | +----------------+ | t1_view | | t2_view | +----------------+ 2 rows in set (0.00 sec) MySQL [test]> show views like '%t1%'; +----------------+ | Tables_in_test | +----------------+ | t1_view | +----------------+ 1 row in set (0.01 sec) MySQL [test]> show views where create_time > '2024-03-18'; +----------------+ | Tables_in_test | +----------------+ | t2_view | +----------------+ 1 row in set (0.02 sec) * [Enhancement](ranger) Disable some permission operations when Ranger or LDAP are enabled (#32538) Disable some permission operations when Ranger or LDAP are enabled. * [chore](ci) exclude unstable trino_connector case (#32892) Co-authored-by: stephen <hello-stephen@qq.com> * [fix](Nereids) NPE when create table with implicit index type (#32893) * [improvement](mtmv) Support more join types for query rewriting by materialized view (#32685) This pattern of rewriting is supported for multi-table joins and supported join types is as following: INNER JOIN LEFT OUTER JOIN RIGHT OUTER JOIN FULL OUTER JOIN LEFT SEMI JOIN RIGHT SEMI JOIN LEFT ANTI JOIN RIGHT ANTI JOIN * [Serde](Variant) support arrow serialization for varint type (#32780) * [fix](multicatalog) fix no data error when read hive table on cosn (#32815) Currently, when reading a hive on cosn table, doris return empty result, but the table has data. iceberg on cosn is ok. The reason is misuse of cosn's file sytem. according to cosn's doc, its fs.cosn.impl should be org.apache.hadoop.fs.CosFileSystem * [fix](nereids)EliminateGroupByConstant should replace agg's output after removing constant group by keys (#32878) * [Fix](executor)Fix regression test for test_active_queries/test_backend_active_tasks #32899 * [fix](iceberg) fix iceberg catalog bug and p2 test cases (#32898) 1. Fix iceberg catalog bug This PR #30198 change the logic of `IcebergHMSExternalCatalog.java`, to get locationUrl by calling hive metastore's `getCatalog()` method. But this method only exists in hive 3+. So it will fail if we using hive 2.x. I temporary remove this logic, because this logic is only used from iceberg table writing. Which is still under development. We will rethink this logic later. 2. Fix test cases Some of P2 test cases missed `order_qt`. And because the output format of the floating point type is changed, some result in `out` files need to be regenerated. * [revert](jni) revert part of #32455 (#32904) * [fix](spill) Avoid releasing resources while spill tasks are executing (#32783) * [chore](log) print query id before logging profile in be.INFO (#32922) * [fix](grace-exit) Stop incorrectly of reportwork cause heap use after free #32929 * [improvement](decommission be) decommission check replica num (#32748) * [fix](arrow-flight) Fix reach limit of connections error (#32911) Fix Reach limit of connections error in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext. Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout. Fix bearer token evict log and exception. TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH * [bugfix](cloud) few variable not initialized (#32868) ../../cloud/src/recycler/meta_checker.cpp can cause uninitialised memory read. * [fix](arrow-flight) Fix arrow flight sql compatible with JDK 17 and upgrade arrow 15.0.2 (#32796) --add-opens=java.base/java.nio=ALL-UNNAMED, see: https://arrow.apache.org/docs/java/install.html#java-compatibility groovy use flight sql connection to execute query SUM(MAX(c1) OVER (PARTITION BY)) report error: AGGREGATE clause must not contain analytic expressions, but no problem in Java execute it with jdbc::arrow-flight-sql. groovy not support print arrow array type, throw IndexOutOfBoundsException. "arrow_flight_sql" not support two phase read ./run-regression-test.sh --run --clean -g arrow_flight_sql * [fix](spill) SpillStream's writer maybe may not have been finalized (#32931) * [improvement](spill) Disable DistinctStreamingAgg when spill is enabled (#32932) * [Improve](inverted_index) update clucene and improve array inverted index writer (#32436) * [Performance](exec) replace SipHash in function by XXHash (#32919) * [feature](agg) add aggregate function sum0 (#32541) * [improvement](mtmv) Support to get tables in materialized view when collecting table in plan (#32797) Support to get tables in materialized view when collecting table in plan table scehma as fllowing: create materialized view mv1 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 1 PROPERTIES ('replication_num' = '1') as select t1.c1, t3.c2 from table1 t1 inner join table3 t3 on t1.c1 = t3.c2 if get table from the plan as follwoing, we can get [table1, table3, table2], the mv1 is expanded to get base tables; SELECT mv1.*, uuid() FROM mv1 LEFT SEMI JOIN table2 ON mv1.c1 = table2.c1 WHERE mv1.c1 IN ( SELECT c1 FROM table2 ) OR mv1.c1 < 10 * [enhance](mtmv)support olap table partition column is null (#32698) * [enhancement](cloud) add table version to cloud (#32738) Add table version to cloud. In Fe: Get: If Fe is cloud mode, get table version from meta service. Update: Op drop/replace temp partition, commit transaction. In meta service: Add: create Index. init value is 1. Remove: by recycler. Update: commit/drop partition rpc, commit txn rpc. Atomic++. * [fix](cloud) schema change from not null to null (#32913) 1. Use equals instead of == for type comparing 2. null bitmap size is reisze by size of ref column. * [feature](Nereids): add ColumnPruningPostProcessor. (#32800) * [case](rowpolicy)fix row policy has been exist (#32880) * [fix](pipeline) fix use error row desc when origin block clear (#32803) * [fix](Nereids) support variant column with index when create table (#32948) * [opt](Nereids) support create table with variant type (#32953) * [test](insert-overwrite) Add insert overwrite auto detect concurrency cases (#32935) * [fix](compile) fe cannot compile in idea (#32955) * [enhancement](plsql) Support select * from routines (#32866) Support show of plsql procedure using select * from routines. * [fix](trino-connector) fix `NoClassDefFoundError` of hudi `Utils` class (#32846) Due to the change of this PR #32455 , the `trino-connector-scanner` package cannot access the `hudi_scanner` package, so the exception NoclassDeffounderror will appear. We need to write a separate Utils class. * [exec](column) change some complex column move to noexcept (#32954) * [Enhancement](data skew) extends show data skew (#32732) * [chore](test) let suite compatible with Nereids (#32964) * Support identical column name in different index. (#32792) * Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470) * [fix](merge-iterator) fix NOT_IMPLEMENTED_ERROR when read next block view (#32961) * [improvement](executor)Add tag property for workload group #32874 * [fix](auth)unified workload and resource permission logic (#32907) - `Grant resource` can no longer grant global `usage_priv` - `grant resource %` instead of `grant resource *` before change: ``` grant usage_priv on resource * to f; show grants for f\G *************************** 1. row *************************** UserIdentity: 'f'@'%' Comment: Password: No Roles: GlobalPrivs: Usage_priv CatalogPrivs: NULL DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv TablePrivs: NULL ColPrivs: NULL ResourcePrivs: NULL CloudClusterPrivs: NULL WorkloadGroupPrivs: normal: Usage_priv ``` after change ``` grant usage_priv on resource '%' to f; show grants for f\G *************************** 1. row *************************** UserIdentity: 'f'@'%' Comment: Password: No Roles: GlobalPrivs: NULL CatalogPrivs: NULL DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv TablePrivs: NULL ColPrivs: NULL ResourcePrivs: %: Usage_priv CloudClusterPrivs: NULL WorkloadGroupPrivs: normal: Usage_priv ``` --------- Co-authored-by: yujun <yu.jun.reach@gmail.com> Co-authored-by: Gavin Chou <gavineaglechou@gmail.com> Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com> Co-authored-by: yongjinhou <109586248+yongjinhou@users.noreply.github.com> Co-authored-by: Dongyang Li <hello_stephen@qq.com> Co-authored-by: stephen <hello-stephen@qq.com> Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com> Co-authored-by: seawinde <149132972+seawinde@users.noreply.github.com> Co-authored-by: lihangyu <15605149486@163.com> Co-authored-by: Yulei-Yang <yulei.yang0699@gmail.com> Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com> Co-authored-by: wangbo <wangbo@apache.org> Co-authored-by: Mingyu Chen <morningman@163.com> Co-authored-by: Jerry Hu <mrhhsg@gmail.com> Co-authored-by: zhiqiang <seuhezhiqiang@163.com> Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com> Co-authored-by: Vallish Pai <vallishpai@gmail.com> Co-authored-by: amory <wangqiannan@selectdb.com> Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: Jensen <czjourney@163.com> Co-authored-by: zhangdong <493738387@qq.com> Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com> Co-authored-by: jakevin <jakevingoo@gmail.com> Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com> Co-authored-by: zclllyybb <zhaochangle@selectdb.com> Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com> Co-authored-by: Xin Liao <liaoxinbit@126.com>
16 tasks
morrySnow
pushed a commit
that referenced
this pull request
Apr 15, 2025
…9589) ### What problem does this PR solve? Related PR: #32878 #49473 Problem Summary: SELECT IF( t.`gender` IN ('女'), ( TIMESTAMPDIFF( YEAR, NOW(), NOW() ) ), 1 ) AS x0, TIMESTAMPDIFF( YEAR, NOW(), NOW() ) AS x1 FROM t1 AS t GROUP BY x0, x1; after EliminateGroupByConstant, this sql will be rewritten to SELECT IF( t.`gender` IN ('女'), 0, 1 ) AS x0, 0 AS x1 FROM t1 AS t GROUP BY IF( t.`gender` IN ('女'), ( TIMESTAMPDIFF( YEAR, NOW(), NOW() ) ), 1 ) ; The select expression and the group by expression is different, and will report error in normalizeagg. The fix in PR #49473 may introduce another issue. Consider the following query: SELECT func2(100) FROM t GROUP BY func1(), func2(func1()); If func1() can be constant-folded to 100, then func2(func1()) will be replaced with func2(100), allowing the query to execute successfully. However, when func1() cannot be folded to 100, the query will fail. This creates an inconsistent behavior where query execution depends on whether func1() can be constant-folded or not, which is not an ideal implementation. To address this issue, this PR modifies the normalizeAgg logic to eliminate constant group by keys. With this change, the query will consistently fail regardless of whether func1() can be folded or not, ensuring more predictable behavior.
github-actions bot
pushed a commit
that referenced
this pull request
Apr 15, 2025
…9589) ### What problem does this PR solve? Related PR: #32878 #49473 Problem Summary: SELECT IF( t.`gender` IN ('女'), ( TIMESTAMPDIFF( YEAR, NOW(), NOW() ) ), 1 ) AS x0, TIMESTAMPDIFF( YEAR, NOW(), NOW() ) AS x1 FROM t1 AS t GROUP BY x0, x1; after EliminateGroupByConstant, this sql will be rewritten to SELECT IF( t.`gender` IN ('女'), 0, 1 ) AS x0, 0 AS x1 FROM t1 AS t GROUP BY IF( t.`gender` IN ('女'), ( TIMESTAMPDIFF( YEAR, NOW(), NOW() ) ), 1 ) ; The select expression and the group by expression is different, and will report error in normalizeagg. The fix in PR #49473 may introduce another issue. Consider the following query: SELECT func2(100) FROM t GROUP BY func1(), func2(func1()); If func1() can be constant-folded to 100, then func2(func1()) will be replaced with func2(100), allowing the query to execute successfully. However, when func1() cannot be folded to 100, the query will fail. This creates an inconsistent behavior where query execution depends on whether func1() can be constant-folded or not, which is not an ideal implementation. To address this issue, this PR modifies the normalizeAgg logic to eliminate constant group by keys. With this change, the query will consistently fail regardless of whether func1() can be folded or not, ensuring more predictable behavior.
seawinde
pushed a commit
to seawinde/doris
that referenced
this pull request
Apr 17, 2025
…ache#49589) ### What problem does this PR solve? Related PR: apache#32878 apache#49473 Problem Summary: SELECT IF( t.`gender` IN ('女'), ( TIMESTAMPDIFF( YEAR, NOW(), NOW() ) ), 1 ) AS x0, TIMESTAMPDIFF( YEAR, NOW(), NOW() ) AS x1 FROM t1 AS t GROUP BY x0, x1; after EliminateGroupByConstant, this sql will be rewritten to SELECT IF( t.`gender` IN ('女'), 0, 1 ) AS x0, 0 AS x1 FROM t1 AS t GROUP BY IF( t.`gender` IN ('女'), ( TIMESTAMPDIFF( YEAR, NOW(), NOW() ) ), 1 ) ; The select expression and the group by expression is different, and will report error in normalizeagg. The fix in PR apache#49473 may introduce another issue. Consider the following query: SELECT func2(100) FROM t GROUP BY func1(), func2(func1()); If func1() can be constant-folded to 100, then func2(func1()) will be replaced with func2(100), allowing the query to execute successfully. However, when func1() cannot be folded to 100, the query will fail. This creates an inconsistent behavior where query execution depends on whether func1() can be constant-folded or not, which is not an ideal implementation. To address this issue, this PR modifies the normalizeAgg logic to eliminate constant group by keys. With this change, the query will consistently fail regardless of whether func1() can be folded or not, ensuring more predictable behavior.
feiniaofeiafei
added a commit
to feiniaofeiafei/doris
that referenced
this pull request
Apr 21, 2025
…ache#49589) Related PR: apache#32878 apache#49473 Problem Summary: SELECT IF( t.`gender` IN ('女'), ( TIMESTAMPDIFF( YEAR, NOW(), NOW() ) ), 1 ) AS x0, TIMESTAMPDIFF( YEAR, NOW(), NOW() ) AS x1 FROM t1 AS t GROUP BY x0, x1; after EliminateGroupByConstant, this sql will be rewritten to SELECT IF( t.`gender` IN ('女'), 0, 1 ) AS x0, 0 AS x1 FROM t1 AS t GROUP BY IF( t.`gender` IN ('女'), ( TIMESTAMPDIFF( YEAR, NOW(), NOW() ) ), 1 ) ; The select expression and the group by expression is different, and will report error in normalizeagg. The fix in PR apache#49473 may introduce another issue. Consider the following query: SELECT func2(100) FROM t GROUP BY func1(), func2(func1()); If func1() can be constant-folded to 100, then func2(func1()) will be replaced with func2(100), allowing the query to execute successfully. However, when func1() cannot be folded to 100, the query will fail. This creates an inconsistent behavior where query execution depends on whether func1() can be constant-folded or not, which is not an ideal implementation. To address this issue, this PR modifies the normalizeAgg logic to eliminate constant group by keys. With this change, the query will consistently fail regardless of whether func1() can be folded or not, ensuring more predictable behavior.
feiniaofeiafei
added a commit
to feiniaofeiafei/doris
that referenced
this pull request
May 8, 2025
…ache#49589) Related PR: apache#32878 apache#49473 Problem Summary: SELECT IF( t.`gender` IN ('女'), ( TIMESTAMPDIFF( YEAR, NOW(), NOW() ) ), 1 ) AS x0, TIMESTAMPDIFF( YEAR, NOW(), NOW() ) AS x1 FROM t1 AS t GROUP BY x0, x1; after EliminateGroupByConstant, this sql will be rewritten to SELECT IF( t.`gender` IN ('女'), 0, 1 ) AS x0, 0 AS x1 FROM t1 AS t GROUP BY IF( t.`gender` IN ('女'), ( TIMESTAMPDIFF( YEAR, NOW(), NOW() ) ), 1 ) ; The select expression and the group by expression is different, and will report error in normalizeagg. The fix in PR apache#49473 may introduce another issue. Consider the following query: SELECT func2(100) FROM t GROUP BY func1(), func2(func1()); If func1() can be constant-folded to 100, then func2(func1()) will be replaced with func2(100), allowing the query to execute successfully. However, when func1() cannot be folded to 100, the query will fail. This creates an inconsistent behavior where query execution depends on whether func1() can be constant-folded or not, which is not an ideal implementation. To address this issue, this PR modifies the normalizeAgg logic to eliminate constant group by keys. With this change, the query will consistently fail regardless of whether func1() can be folded or not, ensuring more predictable behavior.
koarz
pushed a commit
to koarz/doris
that referenced
this pull request
Jun 4, 2025
…ache#49589) ### What problem does this PR solve? Related PR: apache#32878 apache#49473 Problem Summary: SELECT IF( t.`gender` IN ('女'), ( TIMESTAMPDIFF( YEAR, NOW(), NOW() ) ), 1 ) AS x0, TIMESTAMPDIFF( YEAR, NOW(), NOW() ) AS x1 FROM t1 AS t GROUP BY x0, x1; after EliminateGroupByConstant, this sql will be rewritten to SELECT IF( t.`gender` IN ('女'), 0, 1 ) AS x0, 0 AS x1 FROM t1 AS t GROUP BY IF( t.`gender` IN ('女'), ( TIMESTAMPDIFF( YEAR, NOW(), NOW() ) ), 1 ) ; The select expression and the group by expression is different, and will report error in normalizeagg. The fix in PR apache#49473 may introduce another issue. Consider the following query: SELECT func2(100) FROM t GROUP BY func1(), func2(func1()); If func1() can be constant-folded to 100, then func2(func1()) will be replaced with func2(100), allowing the query to execute successfully. However, when func1() cannot be folded to 100, the query will fail. This creates an inconsistent behavior where query execution depends on whether func1() can be constant-folded or not, which is not an ideal implementation. To address this issue, this PR modifies the normalizeAgg logic to eliminate constant group by keys. With this change, the query will consistently fail regardless of whether func1() can be folded or not, ensuring more predictable behavior.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
approved
Indicates a PR has been approved by one committer.
dev/2.0.8-merged
dev/2.1.1-merged
reviewed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed changes
Issue Number: close #xxx
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...