Possible bug relating to the setting of Variable chunksizes #1323

davidhassell · 2024-06-03T11:13:07Z

Hello,

I have found it impossible (at v1.6.5) to get netCDF4 to write out a file with the default chunking strategy - it either writes contiguous, or with explicitly set chunksizes, but never with the default chunks.

To test this I used the following function:

import netCDF4
import numpy as np

def write(**kwargs):
    nc = netCDF4.Dataset('chunk.nc', 'w')
    x = nc.createDimension('x', 80000)
    y = nc.createDimension('y', 4000)
    tas = nc.createVariable('tas', 'f8', ('y', 'x'), **kwargs)
    tas[...] = np.random.random(320000000).reshape(4000, 80000)
    print(tas.chunking())
    nc.close()

and ran it as follows:

In [2]: write()  # Not as expected - expected default chunking
contiguous
In [3]: !ncdump -sh chunk.nc | grep tas:
		tas:_Storage = "contiguous" ;
		tas:_Endianness = "little" ;

In [4]: write(contiguous=False)  # Not as expected - expected default chunking
contiguous
In [5]: !ncdump -sh chunk.nc | grep tas:
		tas:_Storage = "contiguous" ;
		tas:_Endianness = "little" ;

In [6]: write(contiguous=True)  # As expected 
contiguous
In [7]: !ncdump -sh chunk.nc | grep tas:
		tas:_Storage = "contiguous" ;
		tas:_Endianness = "little" ;

In [8]: write(chunksizes=(400, 8000))  # As expected 
[400, 8000]
In [9]: !ncdump -sh chunk.nc | grep tas:
		tas:_Storage = "chunked" ;
		tas:_ChunkSizes = 400, 8000 ;
		tas:_Endianness = "little" ;

Surely it's the case that if contiguous=False, chunksizes=None then the netCDF default chunking strategy should be used?

I found that if I changed line https://github.com/Unidata/netcdf4-python/blob/v1.6.5rel/src/netCDF4/_netCDF4.pyx#L4307 to read:

                    if chunksizes is not None or not contiguous:  # was: if chunksizes is not None or contiguous

then I could get the default chunking to work as expected:

In [2]: write()  # With modified code
[308, 6154]
In [3]: !ncdump -sh chunk.nc | grep tas:
		tas:_Storage = "chunked" ;
		tas:_ChunkSizes = 308, 6154 ;
		tas:_Endianness = "little" ;

In [4]: write(contiguous=False)  # With modified code
[308, 6154]
In [5]: !ncdump -sh chunk.nc | grep tas:		
                tas:_Storage = "chunked" ;
		tas:_ChunkSizes = 308, 6154 ;

In [6]: write(contiguous=True) # With modified code 
contiguous
In [7]: !ncdump -sh chunk.nc | grep tas:
		tas:_Storage = "contiguous" ;
		tas:_Endianness = "little" ;

In [8]: write(chunksizes=(400, 8000))  # With modified code
[400, 8000]
In [9]: !ncdump -sh chunk.nc | grep tas:
		tas:_Storage = "chunked" ;
		tas:_ChunkSizes = 400, 8000 ;
		tas:_Endianness = "little" ;

However, this might not be the best way to do things - what do you think?

Many thanks,
David

>>> netCDF4.__version__
1.6.5

The text was updated successfully, but these errors were encountered:

jswhit · 2024-06-04T21:36:18Z

The current code will not call nc_def_var_chunking at all if chunksizes=None and contiguous=False, which I would think would result in the library default chunking strategy.

jswhit · 2024-06-04T21:43:26Z

I think chunking is only used be default if there is an unlimited dimension. Try this:

import netCDF4
import numpy as np

def write(**kwargs):
    nc = netCDF4.Dataset('chunk.nc', 'w')
    x = nc.createDimension('x', 8000)
    y = nc.createDimension('y', 400)
    z = nc.createDimension('z', None)
    tas = nc.createVariable('tas', 'f8', ('z','y', 'x'), **kwargs)
    tas[0:10,:,:] = np.random.random(32000000).reshape(10,400, 8000)
    print(tas.chunking())
    nc.close()

write()

[1, 200, 4000]

so even if you specify contiguous=False you won't get chunking by default unless there is an unlimited dimension. If there is no unlimited dimension you have to specify the chunksize to get chunking.

I can see how this can be confusing since the default for the contingous kwarg is False, yet the library default is True unless there is an unlimited dimension. It does say this in the netcdf4-python docs though "Fixed size variables (with no unlimited dimension) with no compression filters are contiguous by default."

DennisHeimbigner · 2024-06-04T22:34:16Z

As near as I can tell, when a variable is created, it has default chunksizes computed automatically.
Then, if later, nc_def_var_chunking is called, those default sizes should get overwritten.

davidhassell · 2024-06-05T07:41:19Z

Thanks for the background, @jswhit and @DennisHeimbigner - it's very useful.

So, not a bug then, but maybe a feature request! Could it be possible get netCDF4-python to write with the default chunking strategy a variable that has no unlimited dimensions? I guess that you don't want to change the existing API, so perhaps that could be controlled by a new keyword to createVariable?

Thanks,
David

jswhit · 2024-06-05T20:13:13Z

@davidhassell it is already being reported - variables with no unlimited dimension are not chunked by default (they are contiguous).

davidhassell · 2024-06-06T07:22:26Z

Hi @jswhit, I see that what I wrote was ambiguous - sorry! I'll try again:

I would like to create chunked variables, chunked with the netCDF default chunk sizes, that have no unlimited dimensions. As far as I can tell this is not currently possible, but would you be open to creating this option?

jswhit · 2024-06-06T17:23:59Z

@davidhassell thanks for clarifying, I understand now. Since the python interface doesn't have access to the default chunking algorithm in the C library, I don't know how this would be done. I'm open to suggestions though.

jswhit · 2024-06-06T17:30:52Z

a potential workaround that doesn't require having an unlimited dimension is to turn on compression (zlib=True,complevel=1) or the fletcher checksum algorithm (fletcher32=True).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug relating to the setting of Variable chunksizes #1323

Possible bug relating to the setting of Variable chunksizes #1323

davidhassell commented Jun 3, 2024 •

edited

Loading

jswhit commented Jun 4, 2024

jswhit commented Jun 4, 2024 •

edited

Loading

DennisHeimbigner commented Jun 4, 2024

davidhassell commented Jun 5, 2024

jswhit commented Jun 5, 2024

davidhassell commented Jun 6, 2024

jswhit commented Jun 6, 2024

jswhit commented Jun 6, 2024

Possible bug relating to the setting of Variable chunksizes #1323

Possible bug relating to the setting of Variable chunksizes #1323

Comments

davidhassell commented Jun 3, 2024 • edited Loading

jswhit commented Jun 4, 2024

jswhit commented Jun 4, 2024 • edited Loading

DennisHeimbigner commented Jun 4, 2024

davidhassell commented Jun 5, 2024

jswhit commented Jun 5, 2024

davidhassell commented Jun 6, 2024

jswhit commented Jun 6, 2024

jswhit commented Jun 6, 2024

davidhassell commented Jun 3, 2024 •

edited

Loading

jswhit commented Jun 4, 2024 •

edited

Loading