Skip to content

Retrieve array from RecordBatch for a leaf column #5699

@viirya

Description

@viirya

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

While working filter pushdown for iceberg-rs: apache/iceberg-rust#295, I am going to use the APIs like ArrowPredicateFn and RowFilter.

When constructing ArrowPredicateFn for iceberg predicate, we provide a filtering function that takes RecordBatch based on the given projection.

The RecordBatch contains the columns specified in the projection. And we need to access correct column in the batch to evaluate the predicate.

For top-level column, it should be straightforward. But for nested column, seems no way to access the particular array from the RecordBatch.

We only have the projection (i.e., ProjectionMask) which contains indices of leaf columns in the batch.

For example, if the schema has [a, b, c] top columns. b is a struct column with [aa, bb, cc] columns. Give a predicate like cc > 1, and we know the leaf indices of the nested column cc is 3.

Is there API we can use to access the array of cc in the RecordBatch?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Labels

enhancementAny new improvement worthy of a entry in the changelog

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions