Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Memory netcdf subdatasets do not persist order when buffer is closed #1388

Open
lagamura opened this issue Nov 17, 2024 · 2 comments
Open

Comments

@lagamura
Copy link

To report a non-security related issue, please provide:

  • the version of the software with which you are encountering an issue
    netcdf4 1.7.1 nompi_py311hae66bec_102 conda-forge

  • environmental information (i.e. Operating System, compiler info, java version, python version, etc.)
    OS: Almalinux-9.3, python: 3.11

  • a description of the issue with the steps needed to reproduce it:
    When writing subdatasets to a netcdf in-memory, the subdatasets change index order when the buffer is written as a netcdf file at the end. Following a minimal example:

import numpy as np
from netCDF4 import Dataset
from osgeo import gdal

list_of_subds = ["first_subdataset", "c_subdataset", "b_subdataset"]

ds = Dataset(
    "dump_ds.nc", mode="w", memory=1028, format="NETCDF4"
) 

ds.createDimension("lon", 100)
ds.createDimension("lat", 100)
ds.createDimension("time", None)

for subds in list_of_subds:

    data = ds.createVariable(
        subds,
        "f8",
        ("time", "lat", "lon"),
        zlib=True,
        fill_value=-1,
    )
    data[0, :, :] = np.arange(100)

print(ds)
nc_buf = ds.close()
with open("dump_ds.nc", "wb") as f:
    f.write(nc_buf)

print(gdal.Info("dump_ds.nc"))

In print(ds) we still have ordered subdatasets:

root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): lon(100), lat(100), time(1)
variables(dimensions): float64 first_subdataset(time, lat, lon), float64 c_subdataset(time, lat, lon), float64 b_subdataset(time, lat, lon)
groups:

printing gdal.Info after dumping the nc file:

Subdatasets:
SUBDATASET_1_NAME=NETCDF:"dump_ds.nc":b_subdataset
SUBDATASET_1_DESC=[1x100x100] b_subdataset (64-bit floating-point)
SUBDATASET_2_NAME=NETCDF:"dump_ds.nc":c_subdataset
SUBDATASET_2_DESC=[1x100x100] c_subdataset (64-bit floating-point)
SUBDATASET_3_NAME=NETCDF:"dump_ds.nc":first_subdataset
SUBDATASET_3_DESC=[1x100x100] first_subdataset (64-bit floating-point)

@jswhit
Copy link
Collaborator

jswhit commented Nov 17, 2024

I don't know how gdal chooses how to order to variables - maybe alphabetical? Don't believe this is a bug in netcdf4-python.

@jswhit
Copy link
Collaborator

jswhit commented Nov 17, 2024

Looks like the order of the variables does change when the memory buffer is written out and re-read (ncdump shows the same thing as gdal). I don't know if the order should be preserved - perhaps @DennisHeimbigner would know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants