You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Table optimize is an operation to rearrange the data and/or metadata to speed up queries and/or reduce the metadata size. Some of the ways to accomplish this is to compact small files into large files and/or ordering data by column, clustering the data in Z-order curves etc.
This work adds the “OPTIMIZE (file compaction)” as outlined on the Delta OSS 2022 H1 roadmap here.
Requirements
Optimize should respect the transactional properties of the Delta table. That means it can run in parallel with reads and writes without violating any ACID properties.
In case of conflict during optimize run, optimize should retry once before failing.
Option to select a subset of partitions in a table to optimize.
Support partial progress capture: Instead of committing all the file compaction job changes at the end of the job, commit these changes periodically to DeltaLog so that even if the job fails at least some progress is captured.
Support for Z-Order: Data clustering via multi-column locality-preserving space-filling curves with offline sorting.
The text was updated successfully, but these errors were encountered:
Overview
Table optimize is an operation to rearrange the data and/or metadata to speed up queries and/or reduce the metadata size. Some of the ways to accomplish this is to compact small files into large files and/or ordering data by column, clustering the data in Z-order curves etc.
This work adds the “OPTIMIZE (file compaction)” as outlined on the Delta OSS 2022 H1 roadmap here.
Requirements
Design Sketch
Design details are here.
Future Work
The text was updated successfully, but these errors were encountered: