Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

:FillValue numbers correct? #102

Closed
emmerbodc opened this issue Aug 24, 2022 · 8 comments
Closed

:FillValue numbers correct? #102

emmerbodc opened this issue Aug 24, 2022 · 8 comments
Labels
1.0 For stuff that must be resolved before we are able to release 1.0
Milestone

Comments

@emmerbodc
Copy link
Collaborator

Hello,

I was wondering if the fill values are correct?

LATITUDE_GPS:FillValue = -9999.9;
LATITUDE:FillValue = -9999.9;
TIME:FillValue = -1.0 ;

CF conventions seem to refer to -999.9f

Thanks,
Emma

@castelao
Copy link
Member

Hi Emma. I think that the _FillValue should be free for the user to choose. I don't think CF imposes a value, but those were just examples, see 2.5.1. The only restriction is that it shall be of the same data type of the variable. For instance, we couldn't use a String for latitude, which could create incompatibilities depending on the programming language being used. Do you think that we should reinforce one specific value for _FillValue?

Extending on your point, I personally use NaN for float/double types, and I personally think that we should suggest (suggest only, not impose) using that instead of the 99... pattern. I think that all modern languages can use it. It is an IEEE valid value and is clear what it means. What is your opinion?

Thanks!

@castelao
Copy link
Member

I just realized that this issue is related to #70. Let's converge the discussions here, and whenever this is closed, let's remember to close that issue as well.

I think that the main point over there was to be easy to aggregate chunks (note, it could be as simple as aggregating multiple dives from the same deployment). Otherwise, it would require a prior check on _FillValue being used. That would be a good argument to reinforce the value to be used, but we never know when a user might have a legit need to use something else.

A note to be considered. Some Apps have default _FillValue that are not the 99.. pattern or NaN. If we impose a value, we might create trouble for someone not defining it explicitly.

To be clear, my suggestion is to recommend using NaN for float/double, but the user is allowed to choose any value with the same data type of the variable. What do you all think?

@vturpin
Copy link
Member

vturpin commented Sep 21, 2022

If this is stated clear in the format, that is fine with me.

vturpin added a commit that referenced this issue Sep 21, 2022
Following recommendation of #102 to clarify how to manage fix values
@vturpin vturpin mentioned this issue Sep 21, 2022
12 tasks
@vturpin vturpin added the 1.0 For stuff that must be resolved before we are able to release 1.0 label Sep 22, 2022
@emmerbodc
Copy link
Collaborator Author

@castelao @vturpin Sorry must have missed this notification in my emails.

The complication in having multiple values for _FillValue is when you are looking at multiple files that have originated from different organisations. You would need to do extra work to inspect each file to see the _FillValue they have chosen.

I do think it's important to prescribe a _FillValue. Just for when it comes to other tools or organisations (e.g. SeaDataNet) harvesting these data files. The tools that we/others develop can then also work on all files to filter these out for example when needing to do plotting.

In terms of what the value is, i don't have a strong opinion, just need to make sure it could never be an actual value. I think we ended up having this problem with engineering cycle number in the EGO file and it meant the EGO checker failed.

For LAT LON DEPTH these will be easy to prescribe. For parameters values, we may need to separate out a value for science parameters and engineering parameters.

If we go with NaN how is this stored in the NetCDF files? Are we sure this is consistent across all software NetCDF library versions?

@castelao castelao added this to the OG-1.0 milestone Sep 24, 2022
@justinbuck
Copy link
Collaborator

Hi, fil values will need to differ by observed property. The convention in other networks is that they are harmonised across the network. Effectively there would need to be a code table with the recommended fill value of each OG1 term. This approach make is easier for developers of tools and services that use the OG data . There are times when the recommended fill value need to be updated as happened recently when the output form backscatter sensors changed.

@castelao
Copy link
Member

The NaN is a special encoding for floating-point, so it should be stored in a NetCDF as this specific f32/f64. I.e., a valid float with the same number of bits like +Inf or -Inf. This seems to be a standard defined back on 1985, so any library that still can't handle that should probably be considered a bug.

I still think that using NaN is convenient for developers and users. If someone has to check in a table that 9999.9999 was used for missing latitude, how different would it be to look the manual that NaN was used here or check a table full of NaNs if one wants to be consistent with other systems.

If anyone has a strong opinion about defining tables, we need a volunteer to write those tables and think about where to keep it. Hopefully something easy for machine-to-machine communication.

@castelao
Copy link
Member

If we don't have a volunteer to create such a reference table we will need to allow a free value (once more, as long as the same data type of the variable) to be able to move forward. Note that there is no restriction on CF side, so a free value is a legit option and hence should be a job for any CF-compliant App to deal with normalizing the FillValue before aggregating it.

@vturpin
Copy link
Member

vturpin commented Apr 24, 2023

Also related to to #70

castelao added a commit that referenced this issue Apr 24, 2023
* FillValue for PARAM

Following recommendation of #102 to clarify how to manage fix values

* Update OG_Format.adoc

A decision from meeting 2023-04-24.

* Update OG_Format.adoc

New suggestion from Jenn

---------

Co-authored-by: Guilherme Castelão <guilherme@castelao.net>
@vturpin vturpin closed this as completed Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0 For stuff that must be resolved before we are able to release 1.0
Projects
None yet
Development

No branches or pull requests

4 participants