-
Notifications
You must be signed in to change notification settings - Fork 63
Closed
Description
The Parquet file format reader that is available in core Spark includes a number of optimizations, the main one which is in vectorized columnar reading. In considering a potential migration from the old Spark readers to Iceberg, one would be concerned about the gap in performance that comes from lacking Spark's numerous optimizations in this space.
It is not clear what is the best way to incorporate these optimizations into Iceberg. One option would be to propose moving this code from Spark to parquet-mr. Another would be to invoke Spark's parquet reader directly here, but that is internal API. We could implement vectorized reading directly in Iceberg, but that is very much to suggest that we reinvent the wheel.
Metadata
Metadata
Assignees
Labels
No labels