Skip to content

[Bug] inconsistent results between select distinct and count(distinct) on multi-column combinations #57707

@dh-cloud

Description

@dh-cloud

Search before asking

  • I had searched in the issues and found no similar issues.

Version

4.0

What's Wrong?

We observed inconsistent behavior between SELECT DISTINCT and COUNT(DISTINCT) when operating on multi-column combinations

What You Expected?

SELECT DISTINCT returns x rows (correct).
COUNT(DISTINCT) returns x

How to Reproduce?

Steps to Reproduce
Create a table with nullable columns and duplicate combinations:

CREATE TABLE test_distinct (
col1 VARCHAR(10),
col2 VARCHAR(10)
);
INSERT INTO test_distinct VALUES
('A', 'X'), ('A', 'X'), -- Duplicate
('B', NULL), ('B', NULL), -- NULL duplicates
('C', 'Y'), (NULL, 'Z'), (NULL, NULL);
Execute the following queries:

-- Returns 5 rows (all distinct combinations, including NULLs)
SELECT DISTINCT col1, col2 FROM test_distinct;

-- Returns 2 (incorrect, should match the 5 unique combinations)
SELECT COUNT(DISTINCT col1, col2) FROM test_distinct;
Expected vs Actual Behavior
Expected:
Both queries should agree on the number of distinct combinations (5 in this case)

Actual:

SELECT DISTINCT returns 5 rows (correct).
COUNT(DISTINCT) returns 2 (incorrect, likely due to NULL handling or optimization bugs).

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions