Skip to content

Default parquet reader to reading 64K footer  #4459

@alamb

Description

@alamb

As of #4427 it is easier to see that the DataFusion parquet reader still defaults to reading the last 4 bytes of a parquet file (which contains the metadata length) and then does a second read to read the footer.

Doing two IO operations is likely non ideal, especially for object storage where the cost of an additional read is very expensive relative to reading a bit more data in the first read.

The suggestion is to default reading the last 64k of a parquet file to try and capture the entire footer in a single read

Originally posted by @thinkharderdev in #3885 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions