In Memory netcdf subdatasets do not persist order when buffer is closed #1388

lagamura · 2024-11-17T17:37:41Z

To report a non-security related issue, please provide:

the version of the software with which you are encountering an issue
netcdf4 1.7.1 nompi_py311hae66bec_102 conda-forge
environmental information (i.e. Operating System, compiler info, java version, python version, etc.)
OS: Almalinux-9.3, python: 3.11
a description of the issue with the steps needed to reproduce it:
When writing subdatasets to a netcdf in-memory, the subdatasets change index order when the buffer is written as a netcdf file at the end. Following a minimal example:

import numpy as np
from netCDF4 import Dataset
from osgeo import gdal

list_of_subds = ["first_subdataset", "c_subdataset", "b_subdataset"]

ds = Dataset(
    "dump_ds.nc", mode="w", memory=1028, format="NETCDF4"
) 

ds.createDimension("lon", 100)
ds.createDimension("lat", 100)
ds.createDimension("time", None)

for subds in list_of_subds:

    data = ds.createVariable(
        subds,
        "f8",
        ("time", "lat", "lon"),
        zlib=True,
        fill_value=-1,
    )
    data[0, :, :] = np.arange(100)

print(ds)
nc_buf = ds.close()
with open("dump_ds.nc", "wb") as f:
    f.write(nc_buf)

print(gdal.Info("dump_ds.nc"))

In print(ds) we still have ordered subdatasets:

root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): lon(100), lat(100), time(1)
variables(dimensions): float64 first_subdataset(time, lat, lon), float64 c_subdataset(time, lat, lon), float64 b_subdataset(time, lat, lon)
groups:

printing gdal.Info after dumping the nc file:

Subdatasets:
SUBDATASET_1_NAME=NETCDF:"dump_ds.nc":b_subdataset
SUBDATASET_1_DESC=[1x100x100] b_subdataset (64-bit floating-point)
SUBDATASET_2_NAME=NETCDF:"dump_ds.nc":c_subdataset
SUBDATASET_2_DESC=[1x100x100] c_subdataset (64-bit floating-point)
SUBDATASET_3_NAME=NETCDF:"dump_ds.nc":first_subdataset
SUBDATASET_3_DESC=[1x100x100] first_subdataset (64-bit floating-point)

jswhit · 2024-11-17T23:18:00Z

I don't know how gdal chooses how to order to variables - maybe alphabetical? Don't believe this is a bug in netcdf4-python.

jswhit · 2024-11-17T23:28:42Z

Looks like the order of the variables does change when the memory buffer is written out and re-read (ncdump shows the same thing as gdal). I don't know if the order should be preserved - perhaps @DennisHeimbigner would know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In Memory netcdf subdatasets do not persist order when buffer is closed #1388

In Memory netcdf subdatasets do not persist order when buffer is closed #1388

lagamura commented Nov 17, 2024

jswhit commented Nov 17, 2024

jswhit commented Nov 17, 2024

In Memory netcdf subdatasets do not persist order when buffer is closed #1388

In Memory netcdf subdatasets do not persist order when buffer is closed #1388

Comments

lagamura commented Nov 17, 2024

jswhit commented Nov 17, 2024

jswhit commented Nov 17, 2024