-
Notifications
You must be signed in to change notification settings - Fork 262
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Meaning of the number of significant digits #2389
Comments
So your proposal is that we add these attributes when quantization is used? How would we calculate them? |
Adding attributes and calculating them is not a big deal. Formulating them is a bigger challenge. My proposal for now is to think and share opinions, views and concerns. Then we might come to some weighted solution to deliver the uncertainty, that would be understandable, unambiguous and not too destructive for users. At the moment we have a bunch of terms and concepts that have too vague or even misleading meaning to be of any use in communication. For instance, the meaning of NSD in NetCDF is different for every specific method, and very different from what people think of it. I bet in a year there will be, probably, a couple of people in the world capable interpret corresponding attributes properly without googling through a bunch of inconsistent documents and tons of even more diverse opinions in forums. So, I guess, the term NSD has been already irreversibly spoiled. Besides that there is quite a few other misleading terms around: a "precision-reserving compression" (also my guilt), that preserves precision in exactly the same sense in which shopping preserves money: one trades some precision to get something they consider more valuable. "Statistically accurate method" that introduces unlimited errors in two-point statistics etc. So some tedious work is ahead to clean the mess. The correspondence between the margins and the way precision trimmed has been specified in my slides at EGU2022 (slide 3) and in my GMD paper (2021). I prefer to start from the error margins (defined by the data and applications), and then one can absolutely unequivocally select the best method, number of bits etc... |
Ok I think a good starting point is going to be to move the discussion of error from filters to quantize. Perhaps at the upcoming CF meeting a consensus will be hammered out for how to best express this information... |
I'm going to convert this over to a discussion, as that feels more appropriate for the (anticipated) long-form discussion we'll be having around this. Thanks! |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
I believe, it is an important topic that should be openly discussed. Please let me know if there is a better place for it, it could be moved there then.
@edhartnett in his comment #2369 (comment) wrote
I am afraid that this point of view shared by many in this community. I see two issues here, that have been causing problems and will continue causing them.
I have arranged a small poll from qualified researchers around me, who do work with data, and have quite impressive scientific merits and publication records. The question was:
So far I have got 11 replies:
The one, who answered 5% was a person with whom I extensively discussed before the distortions originating from GranularBitRound, recently introduced in NCO.
I believe, it is way too much to demand every scientist or engineer to get into all the details and ways of precision-trimming in IEEE754 numbers and various ways to interpret NSD and all the terminology around. Therefore I would propose to make a method- and system- (binary, decimal,etc) agnostic means to convey the magnitude of the distortion introduced by a precision trimming procedure.
The variable attributes
storage_abs_error_margin
(in a units of a variable) andstorage_rel_error_margin
(dimensionless fraction) could serve the purpose. They should be clearly distinct from actual error margins, that can be much larger. To avoid ambiguity and round-off errors, the rounding algorithm itself can be fed with two integers: number of keep-bits and binary logarithm of the value of the least-significant bit kept. 8 bit should be sufficient for each of them. (@edhartnett , I hope this answers your question).I would be happy to hear other opinions on the subject and on the best ways to implement it. Thank you for those who got to this point.
The text was updated successfully, but these errors were encountered: