-
Notifications
You must be signed in to change notification settings - Fork 156
Change smallest_power_of_ten to -64 for floats. #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks !!! |
0add241
to
4643ec1
Compare
Merge conflict resolved. |
`18446744073709551615e-65` rounds to `0`, and it is the largest possible value with exponent `-65`. Therefore, we can assume the minimal exponent for which we don't round to zero actually is `-64`.
4643ec1
to
5167a50
Compare
I rebased on top of the right branch, sorry for the confusion. Is it good to merge? if not, what's missing? |
@deadalnix I don't think that there was necessarily a problem with the PR... it did not seem necessary to sync. I do need to carefully check the changes to make sure that we don't introduce bugs. I expect to eventually merge this PR. |
@lemire where do these values come from? I am trying to figure out a way to determine them programatically.
My calculations do seem to align with http://weitz.de/ieee/ from a few spot checks, e.g. 1e-45 is representable as a f32 (0x1), but 1e-46 gets rounded to 0. Edit: but there seems to be some kind of ripple effect in the algorithm, even though the math looks right at a surface level. Testing in the Rust stdlib, changing
Changing My test branch of generic-ifying things https://github.com/tgross35/rust/blob/c0d27fa108a2215bdbcdb2fe1e5d89a62ded51ac/library/core/src/num/dec2flt/float.rs#L89-L100, setting the |
Sure. Please see the papers where this is discussed in details.
|
The paper does say:
But it doesn't mention the significance of these numbers, at least that I can find. 1074 is bias + mantissa bits and 64 seems to be the same constant for both f32 and f64. My guess is that is to account for possible shift within the internal u64 exponent representation. Going off of that:
Which gives the following:
I have not tested against the f16 or bf16 algorithms. Does what is proposed here seem accurate? (cc @jakubjelinek from #148, how did you come up with the values there? And is it possible that -26 and -59 could be the correct min powers of 10?) |
I am not sure what you mean by "doesn't mention the significance of these numbers". Let me be explicit:
![]()
![]()
Can you please explicitly relate your comments to @deadalnix 's PR. |
I do see the numbers in the paper, but I do not see a derevation or relationship with The problem: ceil(log10(2^-1074)) is ~-324, which should be the "smallest power of 10". However, I am trying to explain mathematically why this is specifically -342, not -324 or -341 or -343 or anything else. "10e-341" rounds to 0, as does everything smaller until ~"10e-324". So I am working backwards to try to understand this. As I quoted above from "Fast Number Parsing":
This allows us to arrive at
The relationship to this PR is that it should be trivial to verify the suggested change if such a derevation of the magic number exists. My above formula can produce the number in this PR, saying the change should be correct; I am asking if my formula is correct. The relation to everything else is aiding in applying this algorithm new float types that are not specified in the paper. E.g. if my above formula is indeed correct, then GCC may have an incorrect (but still functional) |
@tgross35 If you believe that there is a bug in GCC, please report the issue with GCC. At this time, Your comments are misplaced. Please take them where they belong. Let me be clear. This PR is not about 16-bit floats and we are not currently supporting 16-bit floats. GCC does support 16-bit floats, but that is the wrong place to discuss issues with GCC. Please raise the issue where it belongs. |
@deadalnix Merged. Thanks. |
18446744073709551615e-65
rounds to0
, and it is the largest possible value with exponent-65
. Therefore, we can assume the minimal exponent for which we don't round to zero actually is-64
.