Skip to content

bug: FileScanTask project_field_ids order could be inconsistent with the RecordBatch schema #627

@chenzl25

Description

@chenzl25

As we know, FileScanTask has two fields project_field_ids and schema. I think the RecordBatch from the reader of this FileScanTask should always follow the schema specified in FileScanTask. However, in some case the schema could be inconsistent.

Considering we have an iceberg table with schema (c1 int, c2 int, c3 int). If we select the table with this order c3, c2, c1. The RecordBatch schema still is c1, c2, c3 which confuses me a lot.

pub struct FileScanTask {
    data_file_path: String,
    project_field_ids: Vec<i32>,
    schema: SchemaRef,
    ...
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions