-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Search before asking
- I had searched in the issues and found no similar issues.
Version
4.0
What's Wrong?
We observed inconsistent behavior between SELECT DISTINCT and COUNT(DISTINCT) when operating on multi-column combinations
What You Expected?
SELECT DISTINCT returns x rows (correct).
COUNT(DISTINCT) returns x
How to Reproduce?
Steps to Reproduce
Create a table with nullable columns and duplicate combinations:
CREATE TABLE test_distinct (
col1 VARCHAR(10),
col2 VARCHAR(10)
);
INSERT INTO test_distinct VALUES
('A', 'X'), ('A', 'X'), -- Duplicate
('B', NULL), ('B', NULL), -- NULL duplicates
('C', 'Y'), (NULL, 'Z'), (NULL, NULL);
Execute the following queries:
-- Returns 5 rows (all distinct combinations, including NULLs)
SELECT DISTINCT col1, col2 FROM test_distinct;
-- Returns 2 (incorrect, should match the 5 unique combinations)
SELECT COUNT(DISTINCT col1, col2) FROM test_distinct;
Expected vs Actual Behavior
Expected:
Both queries should agree on the number of distinct combinations (5 in this case)
Actual:
SELECT DISTINCT returns 5 rows (correct).
COUNT(DISTINCT) returns 2 (incorrect, likely due to NULL handling or optimization bugs).
Anything Else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct