-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_FillValue attribute creation error for NC_GLOBAL with netCDF 4.5.x? #458
Comments
Thanks I will track this down! |
Thank you. FYI 4.5.x also causes a less clear regression with ncpdq, which now fails to pack variables that have a _FillValue attribute. This is a major regression, since people do this all the time. I will try to narrow down what it happening further when I have time. |
Do I have this right? If in.nc was an empty netcdf file, the ncatted command |
Yes, except the _FillValue would be a plain int not int64. |
I now understand that this error is due to #388. FWIW it has always been (or seemed) legal to have a _FillValue defined for a NC_GLOBAL, and NCO supported this. A consequence is that there may be files in the wild with _FillValue defined for root and groups. #388 may make copying/manipulating such files difficult, since an error code will be returned. Does this backwards incompatibility merit revisiting #388? |
@czender Absolutely worth revisiting and a discussion; the files can still be read correctly? We need to avoid breaking the backwards compatibility, and depending on the actual harm done we may just revert #388 and instead update the documentation. @DennisHeimbigner any thoughts? We can discuss this at our meeting if not here. |
I patched the latest NCO snapshot to emit a warning when it receives an NC_EGLOBAL error from nc_put_att(NC_GLOBAL,"_FilllValue"), but otherwise proceed normally. This helps to mitigate the issue. |
In order to keep #388 |
I don't think silently ignoring is a good idea; if data isn't written it should be reported somehow, or else we set ourselves up for future problems when people think they are doing one thing but are really doing another. Maybe we could have a hidden attribute in the file that is used to store warning messages, etc? There's probably a problem with that, I haven't really thought it through yet. |
It occurs to me to ask: did a global _FillValue actually get used in defining |
I'm not sure if Dennis' question was for me so let me clarify that currently/formerly people could have groups with a _FillValue with no error. Some of them may have workflows that cause their variables to "inherit" the _FillValue of the group they are in. I don't know of any specific examples. |
I guess I was trying to figure out what behavior needs to be preserved. |
NCO exposes the full attribute API, i.e., creating, deleting, copying, renaming attributes. If the full attribute API is allowed to continue to work with _FillValue on NC_GLOBAL then nothing changes from the user's point of view, so that would be back-compatible. Eliminating any of those could potentially cause grief to some users. |
NCO is a vital suite of tools; I'm convinced that we need to find a solution (such as Dennis suggests) that does not break the tools or user workflows which adhere to what has become a de-facto standard. As far as I can tell, the global |
Sounds right to me. |
I've created a temporary branch, |
There is another, though possibly related, issue caused by changed treatment of _FillValue in 4.5.x that is, IMHO, both less straightforward and a bigger deal than the NC_GLOBAL issue discussed here (in #458) and in #388. I'm uncertain precisely what netCDF change caused this (possibly #383 and/or #384 ???). As background, NCO allows the _FillValue type to be other than the variable type, and ncpdq does not try to change the _FillValue type to the packed type. 4.5.x no longer allows this, and now emits NC_EBADTYPE (-45): "NetCDF: Not a valid data type or _FillValue type mismatch" from ncpdq's nc_put_var() that writes the packed data to the variable that already has a _FillValue (of a different type) defined. Many people (including me) rely on this feature, although it is not a best practice (I wrote ncpdq in 2004, when missing_value was still the norm, and possibly before _FillValue best practices were defined). So, I consider this issue is partly due to an NCO design flaw (being lazy and not always making typeof(_FillValue) == typeof(variable)). Still, the current algorithm works pretty well when _FillValue is in the range of the packed type (otherwise ncpdq does print a warning). Modifying ncpdq is possible, but could take months. In the meantime, many workflows with ncpdq will break if/when it's linked to 4.5.x. I'm unsure how to proceed. |
My first thought off the top of my head is to back out the change implemented re: different types and instead phase it in slowly while flagging the behavior as deprecated, and work together towards not locking this behavior out altogether until NCO has been updated. Although depending on the scope of the problems it could cause, I am hesitant to do something that breaks widespread workflows built around behavior that was permitted, even if it was documented as not a best practice. Even if it's not a best practice, unless it was explicitly forbidden, i dont know how easily we can/should change the behavior at this point. It will merit more discussion once I'm done with this symposium, but I am inclined to balance the damage caused by enforcing this vs. the hypothetical damage prevented. Let me refer back to those other pull requests when I can, to refamiliarize myself with the motivation for making the changes. |
Can you elaborate on what you mean by the term "packed"? |
Just created a pull request #464 |
Yes, apparently the NC_NOFILL on line 763 prevents ncpdq packing mismatched _FillValue from failing with netCDF < 4.5.x. When I remove that line it fails with NC_EBADTYPE. Thanks for digging-in to this. I have not tested your patch, though it seems promising. |
Thank you very much @wkliao for providing your insight and the patch; thank you also @czender for working with us to figure this out, and reporting it in the first place! I've just tested #464 with the provided test, and so long as ERANGE_FILL is not enabled (default behavior), the test passes. @czender, will this work for you? I suppose there is a possibility of issues being reported if somebody goes out of their way to turn on ERANGE_FILL, but since it's not the default behavior I feel comfortable calling this fixed. I've pushed a new branch, |
Please ping me when @wkliao 's patch is in master and I'll test. Assuming it passes, then this is a fine solution for ncpdq users. Thank you Wei-Keng, for producing this so quickly. |
@czender This is in |
This thread discusses two distinct changes in _FillValue behavior in 4.5.x, each of which broke one NCO regression test that passed with 4.4.x and earlier libraries. Today's master still causes an error when attempting to create a _FillValue for NC_GLOBAL. I mentioned that I patched NCO to continue (not abort) despite this error:
Personally I can live with the current behavior because I don't use/need this feature. As discussed above, Unidata needs to decide whether to allow this feature to work become some users may rely on it, and the new behavior would reduce backwards compatibility for them. As for the second issue, which was diagnosed as causing ncpdq to fail, the current snapshot does fix that issue (thanks to @wkliao for the patch). This will prevent massive breakage in many workflows so this is an important patch to keep. In the future I would like to alter ncpdq so it does not rely on this patch, but that would be a big project that I don't have time for right now. |
I think we are going to allow it; it has been allowed this long and has become permissible, so I think the documentation needs to be updated. I thought I had reverted this behavior already but apparently not. I will look into it. Actually, yes I've reverted it in a branch but have not met with @DennisHeimbigner yet to discuss/confirm. And yes, thank you @wkliao for the patch with ncpdq. @czender - NCO is a very important tool for our community, and breaking it is not something I want to do. I've built a pretty robust automated regression/continuous integration framework based on docker, but the weakness is that it's automated; if a test fails and returns 0, it will not catch it. I'd like to figure out a way to catch some of these netcdf tests that only run via |
Please note that as early as version 3.3 released in 1997, checking type mismatch has already appeared in libsrc/putget.m4. This checking happens during nc_enddef and when fill mode is on.
|
I agree that we should continue to allow _FillValue with NC_GLOBAL. |
FYI. I found the following statement from netcdf man page of v 3.3, libsrc/netcdf.3
|
FYI, this command fails with 4.5.0-rc3 and succeeds in 4.4.x. I thought Unidata intended to continue to let users write _FillValue as group metadata in 4.5.0. The current release candidate does not allow that.
|
That as the intent, and I thought I had reversed the pull request disallowing this. I even thought I had added a test for this. Let me go double-check. This should be allowed. |
This may also require a change to ncgen; I will investigate separately. |
Please re-read my message of August 24. Nothing seems to have changed since then. |
Thanks @czender we will get this in! |
Ok, looking at this we had corrected the initial behavior using an ncpdq-based test. Looking at this ncatted issue to see what's going on. |
It looks like there is a check in libdispatch/dattput.c |
I've got this taken care of and am adjusting a test. I'll roll this in soon today and get ball rolling on the 4.5.0 full release. |
The fix is merged into v4.5.0-release-branch. Once travis-ci completes, I
will merge into master.
…On Thu, Oct 5, 2017 at 4:25 PM, Dennis Heimbigner ***@***.***> wrote:
It looks like there is a check in libdispatch/dattput.c
in (at least) each of the nc_put_att_XXX routines,
where XXX is some type (e.g. int or float or...).
You should be able to find all of them by looking for
occurrences of NC_EGLOBAL.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#458 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEH-UriHXwBrmYg1ayDQMmhKvUkPuyU4ks5spVdGgaJpZM4OyxAr>
.
|
I believe we have this addressed for 4.5.0. Tagging to revisit for 4.5.1. |
It appears this is fixed, closing out as part of issue cleanup, will reopen if incorrect. |
netCDF library behavior of _FillValue seems to have changed in a backwardly incompatible way. With the 20170804 netCDF snapshot this command fails to create a global _FillValue attribute in a netCDF3 file, whereas the same command succeeds with netCDF 4.4.x and earlier versions. Is this new behavior intended?
The text was updated successfully, but these errors were encountered: