-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While working filter pushdown for iceberg-rs: apache/iceberg-rust#295, I am going to use the APIs like ArrowPredicateFn and RowFilter.
When constructing ArrowPredicateFn for iceberg predicate, we provide a filtering function that takes RecordBatch based on the given projection.
The RecordBatch contains the columns specified in the projection. And we need to access correct column in the batch to evaluate the predicate.
For top-level column, it should be straightforward. But for nested column, seems no way to access the particular array from the RecordBatch.
We only have the projection (i.e., ProjectionMask) which contains indices of leaf columns in the batch.
For example, if the schema has [a, b, c] top columns. b is a struct column with [aa, bb, cc] columns. Give a predicate like cc > 1, and we know the leaf indices of the nested column cc is 3.
Is there API we can use to access the array of cc in the RecordBatch?
Describe the solution you'd like
Describe alternatives you've considered
Additional context