Skip to content

Commit 2f5ae5c

Browse files
pepijnvealamb
andauthored
Skip redundant validation checks in RecordBatch#project (#8583)
# Which issue does this PR close? - Closes #8591. # Rationale for this change RecordBatch project currently uses the validating factory function. Since project starts from a valid RecordBatch these checks are redundant. A small amount of work can be saved by using `new_unchecked` instead. A change I'm working on for DataFusion uses `RecordBatch#project` in the inner expression evaluation loop to reduce the amount of redundant array filtering `case` expressions need to do. While a micro optimisation, avoiding redundant work in inner loops seems worthwhile. # What changes are included in this PR? - Use `new_unchecked` instead of `try_new_with_options` in `RecordBatch#project` # Are these changes tested? No additional tests added. Performance difference proven via microbenchmark # Are there any user-facing changes? No Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
1 parent 89e9612 commit 2f5ae5c

File tree

1 file changed

+10
-8
lines changed

1 file changed

+10
-8
lines changed

arrow-array/src/record_batch.rs

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -445,14 +445,16 @@ impl RecordBatch {
445445
})
446446
.collect::<Result<Vec<_>, _>>()?;
447447

448-
RecordBatch::try_new_with_options(
449-
SchemaRef::new(projected_schema),
450-
batch_fields,
451-
&RecordBatchOptions {
452-
match_field_names: true,
453-
row_count: Some(self.row_count),
454-
},
455-
)
448+
unsafe {
449+
// Since we're starting from a valid RecordBatch and project
450+
// creates a strict subset of the original, there's no need to
451+
// redo the validation checks in `try_new_with_options`.
452+
Ok(RecordBatch::new_unchecked(
453+
SchemaRef::new(projected_schema),
454+
batch_fields,
455+
self.row_count,
456+
))
457+
}
456458
}
457459

458460
/// Normalize a semi-structured [`RecordBatch`] into a flat table.

0 commit comments

Comments
 (0)