Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rust): serialize MetricDetails from compaction runs to a string #2317

Merged

Conversation

liamphmurphy
Copy link
Contributor

@liamphmurphy liamphmurphy commented Mar 22, 2024

I am by no means a Rust developer and haven't touched it in years; so please let me know if there's a better way to go about this. The Rust z_order and optimize.compact already serializes the metrics before it is passed back to Python, which then deserializes it back, so the Python behavior in terms of expecting this as a Dict has not changed which I think is what we want.

Description

Adds a custom serialzer and Display implementation for the MetricDetails fields, namely filesAdded and filesRemoved so that those fields are written as strings instead of a struct to the commit log. Query engines expect these fields to be strings on reads.

I had trouble getting the pyspark tests running locally, but here is an example optimize commit log that gets written with these changes:

{"commitInfo":{"timestamp":1711125995487,"operation":"OPTIMIZE","operationParameters":{"targetSize":"104857600","predicate":"[]"},"clientVersion":"delta-rs.0.17.1","readVersion":10,"operationMetrics":{"filesAdded":"{\"avg\":19956.0,\"max\":19956,\"min\":19956,\"totalFiles\":1,\"totalSize\":19956}","filesRemoved":"{\"avg\":4851.833333333333,\"max\":10358,\"min\":3734,\"totalFiles\":6,\"totalSize\":29111}","numBatches":6,"numFilesAdded":1,"numFilesRemoved":6,"partitionsOptimized":1,"preserveInsertionOrder":true,"totalConsideredFiles":6,"totalFilesSkipped":0}}}

Related Issue(s)

Documentation

N/A

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Mar 22, 2024
@liamphmurphy liamphmurphy marked this pull request as ready for review March 22, 2024 16:59
@ion-elgreco
Copy link
Collaborator

@liamphmurphy you could add a python test to test that DESCRIBE HISTORY now works in pyspark on a delta-rs compacted delta table

@liamphmurphy
Copy link
Contributor Author

@liamphmurphy you could add a python test to test that DESCRIBE HISTORY now works in pyspark on a delta-rs compacted delta table

Good idea 👍 , just need to figure out why the spark tests aren't working locally.

@ion-elgreco
Copy link
Collaborator

@liamphmurphy are you adding the markers in pytest?

@github-actions github-actions bot added the binding/python Issues for the Python package label Mar 22, 2024
@liamphmurphy
Copy link
Contributor Author

@liamphmurphy are you adding the markers in pytest?

Was a numpy error, had to reinstall it manually through pip

@ion-elgreco ion-elgreco force-pushed the fix/stringify-compact-metadata-fields branch from ff0e988 to 20c24f5 Compare March 23, 2024 08:01
@liamphmurphy
Copy link
Contributor Author

liamphmurphy commented Mar 23, 2024

@ion-elgreco Pushed up some changes for the lint / cargo fmt errors. The pyspark integration tests pass locally after importing the DeltaTable so hopefully all good there. The last big one seems to be the benchmark showing a regression, will be slow to respond until the week.

Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @liamphmurphy!

In ideal world delta-spark would just be able to parse any commit info value to string, but this also works :D

@ion-elgreco ion-elgreco force-pushed the fix/stringify-compact-metadata-fields branch from d13437b to 557cf87 Compare March 23, 2024 23:51
@ion-elgreco ion-elgreco enabled auto-merge (squash) March 23, 2024 23:51
@ion-elgreco ion-elgreco merged commit 923dfef into delta-io:main Mar 24, 2024
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants