feat: integrate object_store
for read/write with pyarrow
#799
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR embraces the object store crate also on the python side and at least for the current test base supports reading and writing using the dataset and other pyarrow APIs. Thanks to @wjones127 's mutipart upload in object_store, implementing the write functionality was actually quite straigt forward. We now implement
ObjectInputFile
andObjectOutputStream
, which - if wrapped inpyarrow.PythonFile
will work with the arrow ecosystem (so far ;))There was on bigger and breaking deign decision, but I hope people agree :).
Essatially I just accepted, that working exclusively with the relative delta paths makes life much more convenient.. As such rather then adding and removing paths prefixes all the time, I thought it would be reasonable to as users to wrap their own filesystems in a
pyarrow.fs.SubTreeFileSystems
, which points at the table root..Of course the hope is to eventually be at comparable (or higher :)) performance then the c++ file systems. Then there would be little reason (I guess) to still provide a "custom" file system at all.
Related Issue(s)
closes (at least to some degree) #570
closes #574
closes #696
closes #689
towards #542
Documentation