Skip to content

Conversation

@pepijnve
Copy link
Contributor

@pepijnve pepijnve commented Oct 23, 2025

Which issue does this PR close?

Rationale for this change

Explained in issue.

What changes are included in this PR?

Are these changes tested?

Covered by existing tests for filter_record_batch

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 23, 2025
@pepijnve pepijnve changed the title Rb filter Add FilterPredicate::filter_record_batch Oct 23, 2025
@pepijnve pepijnve marked this pull request as draft October 23, 2025 14:10
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pepijnve -- this looks great to me

I verified there is existing coverage for filter_record_batch from which this is now called 👍

https://docs.rs/arrow-select/56.2.0/src/arrow_select/filter.rs.html#1419-1427

https://docs.rs/arrow-select/56.2.0/src/arrow_select/filter.rs.html#2095

/// Returns a filtered [`RecordBatch`] containing only the rows that are selected by this
/// [`FilterPredicate`].
///
/// This is the equivalent of calling [filter] on each column of the [`RecordBatch`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also the equivalent of calling https://docs.rs/arrow/latest/arrow/compute/fn.filter_record_batch.html

Suggested change
/// This is the equivalent of calling [filter] on each column of the [`RecordBatch`].
/// This is the equivalent of calling [filter] on each column of the [`RecordBatch`].
///
/// See also [`filter_record_batch`]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some extra documentation on the convenience free standing functions related to FilterPredicate reuse. I'm not sure how useful it is to have circular documentation references from FilterPredicate back to those.

) -> Result<RecordBatch, ArrowError> {
let mut filter_builder = FilterBuilder::new(predicate);
if record_batch.num_columns() > 1 {
let num_cols = record_batch.num_columns();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also a nice improvement (to also account for arrays with multiple children)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stolen from filter(&dyn Array, &BooleanArray) which was already doing this.

pepijnve and others added 2 commits October 24, 2025 17:16
@alamb alamb merged commit e9a7fe5 into apache:main Oct 24, 2025
29 of 30 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 24, 2025

Thank you @pepijnve and @martin-g

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow FilterPredicate instances to be reused for RecordBatches

3 participants