- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.7k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When computing partition_statistics during evalutaion - flamegraph shows a lot of time spend in bounds_check() which happens as part of a Column::data_type() call.
 
Almost all of the time in bounds_check() is also spend in fmt() which suggests that this goes into the error branch:
impl Column {
    fn bounds_check(&self, input_schema: &Schema) -> Result<()> {
        if self.index < input_schema.fields.len() {
            Ok(())
        } else {
            internal_err!(
                "PhysicalExpr Column references column '{}' at index {} (zero-based) but input schema only has {} columns: {:?}",
                self.name,
                self.index,
                input_schema.fields.len(),
                input_schema.fields().iter().map(|f| f.name()).collect::<Vec<_>>()
            )
        }
    }
}All occurrences that I hand checked from my example were originating from ProjectionExec::partition_statistics()
To Reproduce
Run with RUST_BACKTRACE enabled.
Expected behavior
data_type() method should not trigger bounds_check() to go to an error path for the column.
Additional context
No response
alamb
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working