-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: load table state lazily #1361
Labels
Milestone
Comments
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Right now, when we instantiate a Delta table, we load the entire table state into memory. For many workloads, we often don't need to have all of it, especially if we are only querying certain partitions at a time. Instead, we should ignore the add and remove actions when instantiating, and only load them as needed during scans.
When we read the log files, we should cache them on disk so we can quickly scan them again later for add and remove actions.
Instantiate table
_delta_log
existsThis gives you an instance of
DeltaTable
you can inspect and get table-level metadata. When asked for files, will run scan process below.Scan table
This kind of operation will need to run any time asked for a files.
In-memory caching
Below a certain threshold (which we could make configurable), it's probably fine to keep all the table state in-memory. So we might find some data structure that we can use as a memory-limited in-memory cache for table state. 🤔
The text was updated successfully, but these errors were encountered: