-
-
Notifications
You must be signed in to change notification settings - Fork 935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Writing to in-memory multilayer GPKGs #2875
Comments
Hi @AdnanAvdagic this certainly seems like something that we could support. A PR would definitely be welcome if you're interested! Having a quick look in the fiona case, I'm seeing something different to you.
I get an error on read instead in my local environment (with fiona 1.8.22). But this seems to be in the geopandas wrapping of fiona. I'll try have a look in more detail at some point. There is also another crash if |
@m-richards I think when writing to a in-memory file like object, and then if you want to read it back, you have to put the "current position" back to the start of the object. This works for me: gdf = geopandas.read_file(geopandas.datasets.get_path("nybb"))
tmp_file = io.BytesIO()
gdf.to_file(tmp_file, driver="GPKG")
# this line is needed
tmp_file.seek(0)
In [20]: geopandas.read_file(tmp_file)
Out[20]:
BoroCode BoroName Shape_Leng Shape_Area geometry
0 5 Staten Island 330470.010332 1.623820e+09 MULTIPOLYGON (((970217.022 145643.332, 970227....
1 4 Queens 896344.047763 3.045213e+09 MULTIPOLYGON (((1029606.077 156073.814, 102957...
2 3 Brooklyn 741080.523166 1.937479e+09 MULTIPOLYGON (((1021176.479 151374.797, 102100...
3 1 Manhattan 359299.096471 6.364715e+08 MULTIPOLYGON (((981219.056 188655.316, 980940....
4 2 Bronx 464392.991824 1.186925e+09 MULTIPOLYGON (((1012821.806 229228.265, 101278... So this works for a single layer. The question is now if this can also work with multiple layers. Testing naively writing to the same buffer doesn't work: df = geopandas.GeoDataFrame({"col": [1, 2], "geometry": geopandas.points_from_xy([1, 2], [1, 2])})
df1 = df.iloc[:1]
df2 = df.iloc[1:]
tmp_file = io.BytesIO()
df1.to_file(tmp_file, driver="GPKG", layer="layer1")
# it doesn't work neither with or without the following line
# tmp_file.seek(0)
df2.to_file(tmp_file, driver="GPKG", layer="layer2")
tmp_file.seek(0)
geopandas.read_file(tmp_file, layer="layer1")
tmp_file.seek(0)
geopandas.read_file(tmp_file, layer="layer2")
# -> ValueError: Null layer: 'layer2' Checking with pyogrio confirms that the file only has the first layer: tmp_file.seek(0)
pyogrio.list_layers(tmp_file)
# -> array([['layer1', 'Point']], dtype=object) For normal files, you don't have to explicitly say to append, if you write to a geopackage file that already exist, it will automatically add a new layer (if you provide a different name, or otherwise overwrite the layer, I suppose). tmp_file = io.BytesIO()
df1.to_file(tmp_file, driver="GPKG", layer="layer1")
tmp_file.seek(0)
df2.to_file(tmp_file, driver="GPKG", layer="layer2", mode="a")
# -> OSError: Append mode is not supported for datasets in a Python file object. So with would need some more investigation if this could be possible with either fiona or pyogrio. |
Funny, apparently I was also having a look at this at the same time as @jorisvandenbossche :-)... I tried if it would be possible using the "vsimem" feature of gdal, but it seems that the 2nd layer isn't added to the memory geopackage, but the geopackage is just overwritten. Based on a quick scan of the code in pyogrio it should work, and when the path is a real file it works fine, so I suppose the issue (or the fact that it isn't supported) is in gdal:
|
I think that for pyogrio, we actually don't yet support writing to an in-memory BytesIO or /vsimem at all (also not for a single file / layer). Opened geopandas/pyogrio#249 for this |
Thanks both for clarifying this, I don't use BytesIO very often and it clearly shows |
I am trying to see if this has implication for someone who wants to download a geopackage file from a geodataframe in shiny.
...and it is resulting in this error:
|
@fgashakamba what is the result of running Writing to BytesIO was added in pyogrio 0.8.0 |
@brendan-ward I have |
@fgashakamba best post the entire output of |
Hello @brendan-ward and @theroggy
|
That doesn't seem right; it shows neither fiona nor pyogrio installed, which should raise an error on This will also work: import pyogrio
print(pyogrio.__version__) |
@brendan-ward |
Can you also post the output of |
Sure. Here is the output of
|
There is quite some mixing of channels in the environment: many packages originate from the "defaults" channel, some are coming from "conda-forge", which introduces some risks in getting weird behaviour. In general it is recommended to (try to) avoid installing packages from different channels by creating your environment like this: creating-a-new-environment. Nonetheless, you could try to solve it in your existing environment by upgrading |
I updated
I guess it's not an issue with |
Is your feature request related to a problem?
In my company we use a postgis database where we host all of our customers data. We then allow our customers to export this as any number of file formats, including GPKG. The issue is that we cannot use pythons IO.BytesIO in-memory files if we want multiple layers in the exported GPKG file and we would like not to have to write to disk.
Describe the solution you'd like
Is there any way for geopandas to write to in-memory multilayer GPKGs so they can be sent from the webserver?
API breaking implications
No idea if it breaks anything.
Describe alternatives you've considered
The alternative is creating a temp file on disk and writing to that
Additional context
The text was updated successfully, but these errors were encountered: