-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nccopy bug with user defined types #1956
Comments
The issue is not actually with nccopy. It is an issue with the way that the netcdf library Fixing this is likely to break other code so it may take some time to figure our |
re: Github issue Unidata#1956 The function NC_compare_nc_types in libdispatch/dcopy.c uses an incorrect algorithm to search for types. The core of this is the function NC_rec_find_nc_type in libdispatch/dcopy.c. Currently it searchs the current group and its subtree. Additionally, the function NC4_inq_typeid in libsrc4/nc4internal.c has been extended to handle fully qualified names. It was originally designed to do this, but for some reason never completed. The NC_rec_find_nc_type algorithm has been altered to match the algorithm used by NC4_inq_typeid. It operates as follows. Given a file F, group G and a type T. It searches file F2, group G2, for another type T2 that is equivalent to T. The search order is as follows. 1. Search G2 for a type T2 equivalent to T. 2. Search upwards in the ancestor groups of G2 for a type T2 equivalent to T. 3. Search the complete group tree of F2 in pre-order, breadth-first order to locate T2 equivalent to T. Also add a test case to validate algorithm: ncdump/test_scope.sh. Note, this change may cause compatibility problems, though it is unlikely because two different equivalent type declarations in one dataset is unlikely.
I have a fix, I think. See #1959 |
Since they are multiple approaches to look for a user defined type (i.e. up or down the group hierarchy or even both in the same time), why not make them all available via an extra argument to nccopy (and possibly to other applications too)? As default (i.e. when this extra argument is not set explicitly), nccopy could still use the current approach (i.e. looking down the group hierarchy). This way nccopy would stay backwards compatible while adding functionality. |
Would you be in favor of doing this for dimension name search as well? |
I was under the impression it is not even possible to define dimensions down the group hierarchy from where they are used. Personally, I do not see any reason to do so. Neither for enumerations. But I understand for enumerations this sometimes happens and there are important users that could benefit from using the down the group hierarchy search. Maybe the same is true for dimensions.
Running the above code will fail like this:
|
I guess I have always assumed that the current ability to specify a fully qualified name |
This fails the same way:
|
Attn: @DennisHeimbigner
Issue
Under certain particular circumstances nccopy fails to determine the type of attributes when they have a user defined type.
Informal description
Here is how nccopy fails in this regard:
o first, the type is searched for in the parent group of the variable
o if it is found there, then nccopy works just fine
o else the search continues "recursively" in the subgroups of the variable's parent group; this is where a first issue is as nccopy should look in the ancestor groups, until reaching the root group; the copying of some of the enumerated type variables fails with a "Not a valid data type or _FillValue type mismatch" error because of this
o for other enumerated type variables we see a Segmentation fault instead
o this second issue is because the recursive search mentioned above is probably not what the implementer really intended; instead of doing a depth-first search, it gets stuck in an infinite recursive call (from the variable's parent group back to itself); this causes a stack overflow after enough recursive calls; apparently on Linux systems there is no detection for stack overflow, so this eventually results into a segmentation fault (which weíre seeing for all enumerated type variables in a group which does not itself contain the enumerated type definition, but has subgroups in which to try to recurse, and fails to do that because of the reason described above).
create_and_copy_all_examples.sh will call:
These scripts will create the following minimal example products:
respectively.
create_and_copy_all_examples.sh then attempts to nccopy these three products with the following results:
NetCDF: Not a valid data type or _FillValue type mismatch
Location: file /tmp/build/80754af9/libnetcdf_1582139698333/work/ncdump/nccopy.c; line 1326
(also producing the partial copy product: type_defined_in_ancestor_group_with_no_subgroups_copy.nc)
Segmentation fault (core dumped)
(also producing the corrupted product: type_defined_in_ancestor_group_with_subgroups_copy.nc)
Source code : https://github.com/Unidata/netcdf-c/tree/v4.7.4
Build like this: https://www.unidata.ucar.edu/software/netcdf/docs/getting_and_building_netcdf.html#netCDF-CMake
Source files involved:
Reversed stack trace:
The search for the type should recurse going up the group hierarchy, not down (or in both directions). In the current implementation, the type is looked for in the subgroups of the variable's parent group. But it should be looked for in the supergroups too (or even excusively in the supergroups).
This is the cause of the "Not a valid data type or _FillValue type mismatch" error when the user defined type is defined in a supergroup of the variable's parent group. We follow this approach when we reuse a type in multiple subgroups (avoiding repeating the same definition).
In order to go up the group hierarchy, the nc_inq_grps calls should be replaced by calls to a function determining the ancestor groups on the path to the root, here https://github.com/Unidata/netcdf-c/blob/v4.7.4/libdispatch/dcopy.c#L200 and here https://github.com/Unidata/netcdf-c/blob/v4.7.4/libdispatch/dcopy.c#L206 and the recursive call to NC_rec_find_nc_type should be removed (i.e. https://github.com/Unidata/netcdf-c/blob/v4.7.4/libdispatch/dcopy.c#L213).
In a more efficient approach, just the parent group could be determined and then the other ancestors could be determined by the recursive call to NC_rec_find_nc_type. This should deal with the issue.
Additionally (independent of the main issue regarding the at least restrictive direction of search in the group hierarchy), in the current implementation the recursive call enters an infinite loop because:
This probably causes a stack overflow (which sometimes causes a segmentation fault on Linux systems).
This second issue can be fixed by either:
e.g. replace https://github.com/Unidata/netcdf-c/blob/v4.7.4/libdispatch/dcopy.c#L213
with ret = NC_rec_find_nc_type(ids[i], tid1, ids[i], tid2);
e.g replace https://github.com/Unidata/netcdf-c/blob/v4.7.4/libdispatch/dcopy.c#L200
with if ((ret = nc_inq_grps(ncid2, &nids, NULL)))
and
replace https://github.com/Unidata/netcdf-c/blob/v4.7.4/libdispatch/dcopy.c#L206
with if ((ret = nc_inq_grps(ncid2, &nids, ids))).
Solution 2 will only work if the input and output files have identical group IDs (it needs to be confirmed if that is the case).
These fixes will not solve the first issue (regarding the direction of the search). They'll just avoid the stack overflow.
The text was updated successfully, but these errors were encountered: