-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nccopy ignores chunk spec when input is netcdf-4 contiguous #725
Comments
Thanks the report, taking a look now. |
Duplicated and am observing the same behavior. Trying to narrow down where this is happening. Also, seeing this wil hdf5libversion=1.8.19 and the |
@WardF, thanks for trying an alternate hdf5 version. This seems to me like a relatively simple bug in nccopy code, rather than a deep support library problem. This might be related to unresolved #391, "Using nccopy, setting deflate level to 0 ignores chunking specification". You can mark this as low priority as far as I am concerned. |
The alternate HDF5 version was incidental; it's what was on hand in my dev environment, and I made a note of it so that I wouldn't forget. Thanks! Hoping it is as simple as it seems, going to try to knock it out in short order. |
I'm trying to re-chunk some files that have |
The primary problem is that the code in nccopy.c is incorrect. |
Temporary workaround:
|
In my case the input file is already chunked in time-slices ( |
Do you have an example file (that is not too large)? |
Sure! Take this example. It's a small See the following tests:
So it appears |
So part of the problem is this. |
I think A |
I agree, it should report when it ignores the user specified chunking.
|
Let me propose the following set of rules for applying chunking in nccopy:
Notes:
|
For netCDF4->netCDF4, when no chunking is explicitly specified on the command line, NCO maintains the input chunksizes, if any, in the output file. If compression is specified, then NCO uses the input chunksizes, if any, else the default chunking algorithm is applied. I think this makes the most sense because there are fewer surprises, e.g., input chunksizes are not ignored during compression. The other behaviors proposed above seem good to me. |
Forgot that case. Edited comment to add. |
Here is an alternate proposal for nccopy chunking rules. I think the command line should be considered first, rather than the input format. SUMMARY: Preserve all chunking properties from the input file, except when changed on the command line. These rules apply only when the selected output format supports chunking, i.e. for the netcdf-4 variants. Apply in the following order, independently for each variable to copy:
|
FYI that's pretty much what NCO does, AFAICT without re-reading the code. |
The above proposals all make sense to me, what's most important is that this must be well-documented in the manual and that if any option passed by the user is ignored, there must be a warning message specifying why it was so. |
Belatedly, it has occurred to me that part of the problem is that in the netcdf-C library |
After a long discussion, I implemented the rules at the end of that issue. They are documented in nccopy.1. Additionally, I added a new, per-variable, -c flag that allows for the direct setting of the chunking parameters for a variable. The form is -c var:c1,c2,...ck where var is the name of the variable (possibly a fully qualified name) and the ci are the chunksizes for that variable. It must be the case that the rank of the variable is k. If the new form is used as well as the old form, then the new form overrides the old form for the specified variable. Note that multiple occurrences of the new form -c flag may be specified. Misc. Other fixes 1. Added -M <size> option to nccopy to specify the minimum allowable chunksize. 2. Removed the unused variables from bigmeta.c (Issue #1079) 3. Fixed failure of nc_test4/tst_filter.sh by using the new -M flag (#1) to allow filter test on a small chunk size.
Environment
Linux and Mac, 64-bit
Version tested: nccopy with netcdf-C 4.4.1.1, hdf5-1.10.1
Summary
When using nccopy to convert a netcdf-4 file from contiguous to chunked, the chunk spec on the command line is ignored, and the output file contains invented chunk sizes. IMO, nccopy is not working as advertised in this case.
Remarkably, when the input file is chunked rather than contiguous, the command line chunk spec is respected. (Example not shown.)
Steps to reproduce
Test input file, 5.2 Mbytes: test31.contig.nc.gz
Run this command:
Input file header:
Expected output chunk sizes:
Actual result with unexpected chunk sizes:
The text was updated successfully, but these errors were encountered: