Proposal: Change Accumulator
trait to accept RecordBatch
/ num_rows
to allow faster Count
#8067
Labels
api change
Changes the API exposed to users of the crate
datafusion
Changes in the datafusion crate
enhancement
New feature or request
performance
Make DataFusion faster
Is your feature request related to a problem or challenge?
Currently the
CountAccumulator
implementation requiresvalues: &[ArrayRef]
to be passed.In order to eliminate scanning a (first) column, we need to be able to accept a
RecordBatch
ornum_rows
instead ofvalues: &[ArrayRef]
.Describe the solution you'd like
Rather than changing every method to accept a
RecordBatch
(and needing to update the code), I propose adding two new methods:update_record_batch(&mut self, recordbatch: &RecordBatch)
retract_record_batch(&mut self, recordbatch: &RecordBatch)
The default implementation of the methods can use
update_batch
andupdate_record_batch
(i.e. assume having at least one column).In the aggregation code, we call
update_record_batch
/retract_record_batch
instead.Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: