Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blosc HDF5 codec sometimes fails via nc_def_var_blosc() and nccopy #2458

Closed
czender opened this issue Jul 8, 2022 · 13 comments
Closed

Blosc HDF5 codec sometimes fails via nc_def_var_blosc() and nccopy #2458

czender opened this issue Jul 8, 2022 · 13 comments

Comments

@czender
Copy link
Contributor

czender commented Jul 8, 2022

All of this was done with today's main trunk of netcdf-c on the latest MacOS. The input file in.nc and successful output foo.nc files are attached as CDL text (because I could not figure out how to attach .nc files directly) in.txt and foo.txt, respectively.

I get mixed results with the HDF5 Blosc filter in netCDF 4.9.X.
First, nc_def_var_[deflate,zstd,bzip2] work fine in the same framework.
Yay! Blosc is more complex and I'm not sure if the problems are due to my
NCO invocation of the filter or possibly something else.

The symptoms are that Blosc often works fine for me with one or a few
variables on small test datasets, yet always fails on more complex datasets.
For example, this invokes the default Blosc subcompressor at level 1
using the nc_def_var_blosc() API:

zender@spectral:~$ ncks --log=0 -O -4 -C -v three_dmn_rec_var --cmp=blosc_lz,1 ~/in.nc ~/foo.nc
zender@spectral:~$ ncks -m --hdn ~/foo.nc
netcdf foo {
  dimensions:
    lat = 2 ;
    lon = 4 ;
    time = UNLIMITED ; // (10 currently)

  variables:
    float three_dmn_rec_var(time,lat,lon) ;
      three_dmn_rec_var:long_name = "three dimensional record variable" ;
      three_dmn_rec_var:units = "watt meter-2" ;
      three_dmn_rec_var:_FillValue = -99.f ;
      three_dmn_rec_var:_Storage = "chunked" ;
      three_dmn_rec_var:_ChunkSizes = 1145, 2, 4 ;
      three_dmn_rec_var:_Filter = "32001,2,2,4,36640,1,0,0" ;
      three_dmn_rec_var:_Endianness = "little" ;
} // group /

The output file (attached, along with input file) appears to be valid and the data are good. I will use this output file below to show inexplicable behavior with nccopy.

Problem #1: I can pick any number of other variables in this same file that NCO fails to compress with Blosc via nc_def_var_blosc(). I do not expect Unidata to debug NCO. Below I'll show some nccopy behavior that results in similar, though not identical failures. This first problem is more intended to demonstrate the capriciousness of the Blosc filter behavior. Blosc works for me on three_dmn_rec_var (above) why not on time_bnds?:

zender@spectral:~$ ncks --log=0 -O -4 -C -v time_bnds --cmp=blosc_lz,1 ~/in.nc ~/foo_fail.nc
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 0:
  #000: H5F.c line 677 in H5Fflush(): unable to flush file
    major: File accessibility
    minor: Unable to flush data from cache
  #001: H5VLcallback.c line 3769 in H5VL_file_specific(): file specific failed
    major: Virtual Object Layer
    minor: Can't operate on object
  #002: H5VLcallback.c line 3699 in H5VL__file_specific(): file specific failed
    major: Virtual Object Layer
    minor: Can't operate on object
  #003: H5VLnative_file.c line 316 in H5VL__native_file_specific(): unable to flush mounted file hierarchy
    major: File accessibility
    minor: Unable to flush data from cache
  #004: H5Fmount.c line 692 in H5F_flush_mounts(): unable to flush mounted file hierarchy
    major: File accessibility
    minor: Unable to flush data from cache
  #005: H5Fmount.c line 654 in H5F__flush_mounts_recurse(): unable to flush file's cached information
    major: File accessibility
    minor: Unable to flush data from cache
  #006: H5Fint.c line 2283 in H5F__flush(): unable to flush file data
    major: Object cache
    minor: Unable to flush data from cache
  #007: H5Fint.c line 2167 in H5F__flush_phase1(): unable to flush dataset cache
    major: Object cache
    minor: Unable to flush data from cache
  #008: H5Dint.c line 3547 in H5D_flush_all(): unable to flush cached dataset info
    major: Dataset
    minor: Iteration failed
  #009: H5Iint.c line 1374 in H5I_iterate(): iteration failed
    major: Object atom
    minor: Iteration failed
  #010: H5Dint.c line 3520 in H5D__flush_all_cb(): unable to flush cached dataset info
    major: Dataset
    minor: Write failed
  #011: H5Dint.c line 3241 in H5D__flush_real(): unable to flush raw data
    major: Dataset
    minor: Unable to flush data from cache
  #012: H5Dchunk.c line 2804 in H5D__chunk_flush(): unable to flush one or more raw data chunks
    major: Dataset
    minor: Unable to flush data from cache
  #013: H5Dchunk.c line 3393 in H5D__chunk_flush_entry(): output pipeline failed
    major: Dataset
    minor: Filter operation failed
  #014: H5Z.c line 1442 in H5Z_pipeline(): filter returned failure
    major: Data filters
    minor: Write failed
ERROR NC_EHDFERR Error at HDF5 layer

Problem #2: nccopy does not work for me with the above _Filter string. Given the complexity of the Blosc filter, I'm not sure it should work with that filter string. Any clarification would be helpful to my debugging this issue.

zender@spectral:~$ nccopy -L0 -4 -V three_dmn_rec_var -F *,32001,2,2,4,36640,1,0,0 ~/in.nc ~/foo_fail.nc
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 0:
...[logging info omitted]...
NetCDF: HDF error
Location: file nccopy.c; line 2145
zender@spectral:~$ 

Problem #3: nccopy also fails to copy the (apparently valid) output file from above. This time I include the logging info because it mentions some _Quantize attributes that are not employed at all in this output file (perhaps that is a red herring, but I thought it might be relevant):

zender@spectral:~$ nccopy -L0 ~/foo.nc ~/foo2.nc
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 0:
  #000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
    major: Attribute
    minor: Can't open object
  #001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
    major: Attribute
    minor: Can't open object
  #004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
    major: Attribute
    minor: Unable to initialize object
  #005: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeBitGroomNumberOfSignificantDigits'
    major: Attribute
    minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 0:
  #000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
    major: Attribute
    minor: Can't open object
  #001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
    major: Attribute
    minor: Can't open object
  #004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
    major: Attribute
    minor: Unable to initialize object
  #005: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeGranularBitRoundNumberOfSignificantDigits'
    major: Attribute
    minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 0:
  #000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
    major: Attribute
    minor: Can't open object
  #001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
    major: Attribute
    minor: Can't open object
  #004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
    major: Attribute
    minor: Unable to initialize object
  #005: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeBitRoundNumberOfSignificantBits'
    major: Attribute
    minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 0:
  #000: H5VLcallback.c line 3769 in H5VL_file_specific(): file specific failed
    major: Virtual Object Layer
    minor: Can't operate on object
  #001: H5VLcallback.c line 3699 in H5VL__file_specific(): file specific failed
    major: Virtual Object Layer
    minor: Can't operate on object
  #002: H5VLnative_file.c line 316 in H5VL__native_file_specific(): unable to flush mounted file hierarchy
    major: File accessibility
    minor: Unable to flush data from cache
  #003: H5Fmount.c line 692 in H5F_flush_mounts(): unable to flush mounted file hierarchy
    major: File accessibility
    minor: Unable to flush data from cache
  #004: H5Fmount.c line 654 in H5F__flush_mounts_recurse(): unable to flush file's cached information
    major: File accessibility
    minor: Unable to flush data from cache
  #005: H5Fint.c line 2283 in H5F__flush(): unable to flush file data
    major: Object cache
    minor: Unable to flush data from cache
  #006: H5Fint.c line 2167 in H5F__flush_phase1(): unable to flush dataset cache
    major: Object cache
    minor: Unable to flush data from cache
  #007: H5Dint.c line 3547 in H5D_flush_all(): unable to flush cached dataset info
    major: Dataset
    minor: Iteration failed
  #008: H5Iint.c line 1374 in H5I_iterate(): iteration failed
    major: Object atom
    minor: Iteration failed
  #009: H5Dint.c line 3520 in H5D__flush_all_cb(): unable to flush cached dataset info
    major: Dataset
    minor: Write failed
  #010: H5Dint.c line 3241 in H5D__flush_real(): unable to flush raw data
    major: Dataset
    minor: Unable to flush data from cache
  #011: H5Dchunk.c line 2804 in H5D__chunk_flush(): unable to flush one or more raw data chunks
    major: Dataset
    minor: Unable to flush data from cache
  #012: H5Dchunk.c line 3393 in H5D__chunk_flush_entry(): output pipeline failed
    major: Dataset
    minor: Filter operation failed
  #013: H5Z.c line 1442 in H5Z_pipeline(): filter returned failure
    major: Data filters
    minor: Write failed
  #014: H5Dchunk.c line 3393 in H5D__chunk_flush_entry(): output pipeline failed
    major: Dataset
    minor: Filter operation failed
  #015: H5Z.c line 1442 in H5Z_pipeline(): filter returned failure
    major: Data filters
    minor: Write failed
  #016: H5Dchunk.c line 3393 in H5D__chunk_flush_entry(): output pipeline failed
    major: Dataset
    minor: Filter operation failed
  #017: H5Z.c line 1442 in H5Z_pipeline(): filter returned failure
    major: Data filters
    minor: Write failed
  #018: H5Dchunk.c line 3393 in H5D__chunk_flush_entry(): output pipeline failed
    major: Dataset
    minor: Filter operation failed
  #019: H5Z.c line 1442 in H5Z_pipeline(): filter returned failure
    major: Data filters
    minor: Write failed
  #020: H5Dchunk.c line 3393 in H5D__chunk_flush_entry(): output pipeline failed
    major: Dataset
    minor: Filter operation failed
  #021: H5Z.c line 1442 in H5Z_pipeline(): filter returned failure
    major: Data filters
    minor: Write failed
  #022: H5Dchunk.c line 3393 in H5D__chunk_flush_entry(): output pipeline failed
    major: Dataset
    minor: Filter operation failed
  #023: H5Z.c line 1442 in H5Z_pipeline(): filter returned failure
    major: Data filters
    minor: Write failed
  #024: H5Dchunk.c line 3393 in H5D__chunk_flush_entry(): output pipeline failed
    major: Dataset
    minor: Filter operation failed
  #025: H5Z.c line 1442 in H5Z_pipeline(): filter returned failure
    major: Data filters
    minor: Write failed
  #026: H5Dchunk.c line 3393 in H5D__chunk_flush_entry(): output pipeline failed
    major: Dataset
    minor: Filter operation failed
  #027: H5Z.c line 1442 in H5Z_pipeline(): filter returned failure
    major: Data filters
    minor: Write failed
  #028: H5Dchunk.c line 3393 in H5D__chunk_flush_entry(): output pipeline failed
    major: Dataset
    minor: Filter operation failed
  #029: H5Z.c line 1442 in H5Z_pipeline(): filter returned failure
    major: Data filters
    minor: Write failed
  #030: H5Dchunk.c line 3393 in H5D__chunk_flush_entry(): output pipeline failed
    major: Dataset
    minor: Filter operation failed
  #031: H5Z.c line 1442 in H5Z_pipeline(): filter returned failure
    major: Data filters
    minor: Write failed
NetCDF: HDF error
Location: file nccopy.c; line 2145
zender@spectral:~$ 

That's enough to start this thread. My immediate goal is to help isolate whether the problem(s) are in my code, my understanding of how to invoke Blosc, and/or in the netCDF-C implementation. Any guidance welcome on any of these three problems. Thanks for reading this far!

@DennisHeimbigner
Copy link
Collaborator

It would be better if you attached the .nc files (if they are not too large).
You can do it either by renaming them to e.g. in.nc.txt,
or you can zip or gzip them.

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Jul 8, 2022

Also, there should be an alternative blosc filter in the plugins directory.
You might try that one, although it depends on the filter parameters being
the same.
Never mind, I thought you were using a CCR implementation.

@czender
Copy link
Contributor Author

czender commented Jul 8, 2022

Yes, Dennis, all of this is done with netCDF 4.9.X, not with CCR. The files are both quite small...in.nc and foo.nc. It would be great to hear what you think about all this.

@DennisHeimbigner
Copy link
Collaborator

This very odd. Apparently the chunksize values are being passed in an
inconsistent way. I thought it might be the unlimited dimension so
I converted it to a fixed size dimension, but it still fails in the same way.
Will continue to look at it.

@DennisHeimbigner
Copy link
Collaborator

I figured out the problem, but fixing that produced this error, which I have
never before seen:

HDF5: infinite loop closing library

@DennisHeimbigner
Copy link
Collaborator

Ok, this is weird.
After fixing some things, It turns out that it fails depending
on the choice of sub-compressor. For some reason, the LZ compressors
claim the data is incompressible. Other sub-compressors such as zlib or
zstandard appear to work. I suspect that either there is something wrong
with the c-blosc LZ implemenations or we are passing bad parameters for them.

@DennisHeimbigner
Copy link
Collaborator

Further notes:

I tried it using libblosc directly and it report that the data is in-compressible
The parameters were:

  • compressor any LZ compressor (LZ4,BLOSCLC, etc)
  • shuffle off
  • level = 1
  • data: 1.0...80.0

I could get it to compress if I turned on shuffle.
So this is apparently a limitation in the LZ compressors.

@czender
Copy link
Contributor Author

czender commented Jul 11, 2022

Thanks much for looking into this, Dennis. Always turning on Blosc-shuffle when any Blosc is required is an easy change to make in NCO (though ideally it would not be necessary). Above you mention "fixing some things". Have you made or do you foresee making any changes to netCDF to get Blosc codecs working better? I'm still not sure whether the problems I'm having with Blosc are due to NCO, netCDF, the Blosc filter itself, or some combination.

@DennisHeimbigner
Copy link
Collaborator

I found and fixed a couple of errors in the HDF5 blosc filter. I will put up a PR
for those changes shortly.
It appears that shuffle is only required for the blosc LZ sub-compressors.
ZLIB and ZSTD seem to work either way.
Also, I suspect the problem is highly data dependent. In the example
you provided, it was trying to compress the sequence of floats 1..80.
It is probably the case that a more random data set would work even
without shuffle.

@czender
Copy link
Contributor Author

czender commented Jul 12, 2022

Great. Looking forward to it. I will re-test for robustness across datasets and sub-compressors when it lands in main branch.

@DennisHeimbigner
Copy link
Collaborator

I ran across this also: Blosc/c-blosc#307

@czender
Copy link
Contributor Author

czender commented Jul 12, 2022

Ahhh. Quite relevant. This suggests that the calling application skip invoking the Blosc codec for block sizes < 4 kb. Sound reasonable?

DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Jul 12, 2022
re: Issue Unidata#2458

The above Github Issue revealed some bugs in the file netcdf-c/plugins/H5Zblosc.c. Fixed and added a testcase. Also discovered that the Blosc LZ sub-compressors do not work well with small datasets.

Misc. Other Change(s): I noticed that the file "dap4_test/baselinethredds/GOES16_CONUS_20170821_020218_0.47_1km_33.3N_91.4W.nc4.thredds" is still causing tar errors during "make distcheck", so I made some changes to do rename at test-time.
@DennisHeimbigner
Copy link
Collaborator

Fixed by #2461

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants