Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multi-partition error when using pre_transform_spec #124

Merged
merged 8 commits into from
Jun 9, 2022

Conversation

jonmmease
Copy link
Collaborator

This PR fixes an error that cropped up when using pre_transform_spec from Python with more then 8096 elements (the partition size that VegaFusion uses).

A test to catch one particular manifestation of the error is added in test_pretransform_multi_partition.py. The issue seems related to using the zero-copy interface from Python/PyArrow to Rust/arrow-rs and then serializing the resulting RecordBatches into the arrow IPC format. It is potentially related to apache/arrow-rs#390.

This PR removes the use of the zero-copy interface and instead serializes the PyArrow table to bytes in Python and deserializes in Rust. It also refactors the VegaFusion internals to remove a prior IPC serialization step, so there shouldn't be an appreciable difference in memory usage or performance. Once we have time to more fully diagnose the issue with the zero-copy interface, it would obviously be advantageous to re-adopt this interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant