You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened: I don't know whether this can be classified as a bug or is an intentional behavior. Unfortunately, I have not been able to find any more information about this
When runnin a z_order optimization it process all the files in the delta table even though some of them could be skipped saving execution time and memory
What you expected to happen:
In this simple case we can see how in the first optimization all files are deleted and re-added even if there is only one file per partition.
In the second optimization only the partition 'a' files could be processed since the ones of partition 'b' are already been optimized and no new writes occurred
I know about the partition_filters parameter of the z_order method, but sometimes is not possibile to define which are the partition to optimize since writings and optimizations could be on separate and indipendent processes
The text was updated successfully, but these errors were encountered:
Disclaimer: I am not a maintainer or even contributor to delta-rs.
My understanding of Z-Ordering is this is exactly correct. It is not an inherently incremental operation unlike Liquid Clustering. It will always run on whatever where clause you give it. In this case it will correctly run on the entire table because you did not specify anything. I can't imagine "fixing" this is in scope for delta-rs because this is fundamental Delta behavior.
It would be on you to supply the partition(s) to run on, even if you do that by dynamically detecting the data you just wrote and only supplying those partition(s) to the where clause.
Environment
Delta-rs version:
0.24.0
Binding:
Environment:
Bug
What happened:
I don't know whether this can be classified as a bug or is an intentional behavior. Unfortunately, I have not been able to find any more information about this
When runnin a z_order optimization it process all the files in the delta table even though some of them could be skipped saving execution time and memory
How to reproduce it:
Ouput:
What you expected to happen:
In this simple case we can see how in the first optimization all files are deleted and re-added even if there is only one file per partition.
In the second optimization only the partition
'a'
files could be processed since the ones of partition'b'
are already been optimized and no new writes occurredI know about the
partition_filters
parameter of the z_order method, but sometimes is not possibile to define which are the partition to optimize since writings and optimizations could be on separate and indipendent processesThe text was updated successfully, but these errors were encountered: