You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
let batch = RecordBatch::try_new(schema, output)?;
It takes the schema of the input batch and then uses that schema for the intermediate aggregate data to spill to disk. These schemas are clearly not going to match, but I can't quite grok where exactly things have gone wrong.
To Reproduce
Below is a minimal example which should be analogous to a query like SELECT MIN(b), MAX(b) FROM table GROUP BY a. The batch_size and memory pool size are set small to trigger a spill.
Describe the bug
There is an issue when using multiple aggregations and triggering a spill in a
GroupedHashAggregateStream
.A query like
SELECT MIN(b), MAX(b) FROM table GROUP BY a
results in an errorThe problematic schema is captured here:
datafusion/datafusion/physical-plan/src/aggregates/row_hash.rs
Lines 967 to 969 in 9b5995f
And the error gets thrown from
emit
here:datafusion/datafusion/physical-plan/src/aggregates/row_hash.rs
Line 950 in 9b5995f
It takes the schema of the input batch and then uses that schema for the intermediate aggregate data to spill to disk. These schemas are clearly not going to match, but I can't quite grok where exactly things have gone wrong.
To Reproduce
Below is a minimal example which should be analogous to a query like
SELECT MIN(b), MAX(b) FROM table GROUP BY a
. Thebatch_size
and memory pool size are set small to trigger a spill.Expected behavior
I expect the test to complete successfully.
Additional context
No response
The text was updated successfully, but these errors were encountered: