Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #49589

…9589)

### What problem does this PR solve?

Related PR: #32878 #49473

Problem Summary:

 SELECT
    IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) AS x0,
    TIMESTAMPDIFF(
        YEAR,
        NOW(),
        NOW()
    ) AS x1
FROM
t1 AS t
GROUP BY
    x0,
    x1;

after EliminateGroupByConstant, this sql will be rewritten to

 SELECT
    IF(
        t.`gender` IN ('女'),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) ;

The select expression and the group by expression is different, and will
report error in normalizeagg.

The fix in PR #49473 may introduce another issue. Consider the following
query:

SELECT func2(100) FROM t GROUP BY func1(), func2(func1());

If func1() can be constant-folded to 100, then func2(func1()) will be
replaced with func2(100), allowing the query to execute successfully.
However, when func1() cannot be folded to 100, the query will fail. This
creates an inconsistent behavior where query execution depends on
whether func1() can be constant-folded or not, which is not an ideal
implementation.

To address this issue, this PR modifies the normalizeAgg logic to
eliminate constant group by keys. With this change, the query will
consistently fail regardless of whether func1() can be folded or not,
ensuring more predictable behavior.
@github-actions github-actions bot requested a review from dataroaring as a code owner April 15, 2025 03:30
@Thearas
Copy link
Contributor

Thearas commented Apr 15, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Apr 15, 2025
@Thearas
Copy link
Contributor

Thearas commented Apr 15, 2025

run buildall

@morrySnow morrySnow closed this May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants