Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: support writing to in-memory (byte) objects #249

Closed
jorisvandenbossche opened this issue May 1, 2023 · 3 comments · Fixed by #397
Closed

ENH: support writing to in-memory (byte) objects #249

jorisvandenbossche opened this issue May 1, 2023 · 3 comments · Fixed by #397

Comments

@jorisvandenbossche
Copy link
Member

We support reading from an in-memory buffer / bytes object (#22), but not yet writing to it.

As a starter, the write path assumes the path is a string (or we will convert it to a string in several places), and so passing a BytesIO object doesn't work (currently we will actually create a file in the current directory with a name like "<_io.BytesIO object at 0x7f229d2a1a80>" because of calling str(path)).

If we want to support writing to a buffer, our current code for handling this on the read path (buffer_to_virtual_file to create a /vsimem/.. file) will not be sufficient, because this creates a VSIMemFile that doesn't own the buffer's data, and thus can't expand that size of the buffer (which will be needed to write to an empty BytesIO).

From a quick look, two potential strategies:

  • When writing, create a custom VSIMemFile that owns its memory, and at the end after writing the file, copy out the buffer from the gdal in-memory file, and write that data into the Python in-memory file.
    This is probably simpler to implement, but incurs an extra copy of the data when you want the result to end up in a Python object (like BytesIO) that was provided by the user. Alternatively, we could also have a way to just return the buffer as bytes, which shouldn't require an extra copy (if GDAL allows to transfer the ownership).
    I think this is also what fiona currently does: https://github.com/Toblerity/Fiona/blob/54428a1d39d1115d3e5e7158e09f37ac3623de23/fiona/__init__.py#L277-L283
  • Wrap the Python file-like object (like BytesIO for in-memory) in a custom GDAL filesystem and file object plugin (through callbacks that link the Python object read/seek/write methods to the methods in C++), which is covered by ENH: reading from file-like objects #42 for the read side (and which was implemented in rasterio)
@brendan-ward
Copy link
Member

The second option looks reasonable based on the way this was implemented rasterio, assuming that adding write functionality on top of the read functionality implemented there is relatively straightfroward. We could start with re-implementing the read interface in pyogrio first (#42) and adjust to that, then extend it to enable write.

@jorisvandenbossche
Copy link
Member Author

@brendan-ward in case this is useful, what I started writing (very draft): https://github.com/geopandas/pyogrio/compare/main...jorisvandenbossche:pyogrio:write-vsimem?expand=1
Didn't yet actually wire it up in the writing logic, but the idea is that we would detect if we get passed a file-like object, in that case pass the /vsimem/... path to GDAL, and afterwards read the buffer and write it into the file-like object (like the callback in fiona)

@brendan-ward
Copy link
Member

@jorisvandenbossche thanks for the start! I have some ideas on how to proceed now and will work on drafting a PR for this shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants