-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Labels
bugSomething isn't workingSomething isn't workingregressionSomething that used to work no longer doesSomething that used to work no longer does
Description
Describe the bug
While debugging the DataFusion 52 upgrade, I found a wrong results bug with pre-sorted data that was introduced in 52
To Reproduce
CREATE TABLE agg_src(x INT, y INT, v INT) AS VALUES
(1, 1, 10),
(1, 2, 20),
(1, 3, 30),
(2, 1, 40),
(2, 2, 50),
(2, 3, 60);
-- Create an ordered table:
COPY (SELECT * FROM agg_src ORDER BY x, y) TO 'foo.parquet';Then run
CREATE EXTERNAL TABLE agg_src_sorted(x INT, y INT, v INT) STORED AS PARQUET LOCATION 'foo.parquet' WITH ORDER (x ASC, y ASC);
set datafusion.execution.target_partitions = 1;
-- This query orders by an expresson of y that breaks the ordering
SELECT
x,
CAST(y AS BIGINT) % 2,
SUM(v)
FROM agg_src_sorted
GROUP BY x, CAST(y AS BIGINT) % 2
ORDER BY x, CAST(y AS BIGINT) % 2;
With Datafusion 52, you get the wrong answer:
andrewlamb@Andrews-MacBook-Pro-3 ~ % ~/Software/datafusion-cli/datafusion-cli-52.1.0> SELECT
x,
CAST(y AS BIGINT) % 2,
SUM(v)
FROM agg_src_sorted
GROUP BY x, CAST(y AS BIGINT) % 2
ORDER BY x, CAST(y AS BIGINT) % 2;
+---+-----------------------------+-----------------------+
| x | agg_src_sorted.y % Int64(2) | sum(agg_src_sorted.v) |
+---+-----------------------------+-----------------------+
| 1 | 1 | 40 |
| 1 | 0 | 20 | <---- the second column is 1 then 0, rather than 0 then 1
| 2 | 1 | 100 |
| 2 | 0 | 50 |
+---+-----------------------------+-----------------------+
4 row(s) fetched.
Elapsed 0.006 seconds.On datafusion 51
andrewlamb@Andrews-MacBook-Pro-3 ~ % ~/Software/datafusion-cli/datafusion-cli-51.0.0You get the expected answer
> SELECT
x,
CAST(y AS BIGINT) % 2,
SUM(v)
FROM agg_src_sorted
GROUP BY x, CAST(y AS BIGINT) % 2
ORDER BY x, CAST(y AS BIGINT) % 2;
+---+-----------------------------+-----------------------+
| x | agg_src_sorted.y % Int64(2) | sum(agg_src_sorted.v) |
+---+-----------------------------+-----------------------+
| 1 | 0 | 20 | <---- this row is in the correct sopt
| 1 | 1 | 40 |
| 2 | 0 | 50 |
| 2 | 1 | 100 |
+---+-----------------------------+-----------------------+
4 row(s) fetched.
Elapsed 0.002 seconds.
### Expected behavior
_No response_
### Additional context
_No response_Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingregressionSomething that used to work no longer doesSomething that used to work no longer does