Skip to content

Change smallest_power_of_ten to -64 for floats. #167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 27, 2024

Conversation

deadalnix
Copy link
Contributor

18446744073709551615e-65 rounds to 0, and it is the largest possible value with exponent -65. Therefore, we can assume the minimal exponent for which we don't round to zero actually is -64.

@lemire
Copy link
Member

lemire commented Jan 10, 2023

Thanks !!!

@deadalnix deadalnix force-pushed the smallest_power_of_ten branch from 0add241 to 4643ec1 Compare July 10, 2023 12:06
@deadalnix
Copy link
Contributor Author

Merge conflict resolved.

`18446744073709551615e-65` rounds to `0`, and it is the largest possible value with exponent `-65`. Therefore, we can assume the minimal exponent for which we don't round to zero actually is `-64`.
@deadalnix deadalnix force-pushed the smallest_power_of_ten branch from 4643ec1 to 5167a50 Compare July 12, 2023 14:27
@deadalnix
Copy link
Contributor Author

I rebased on top of the right branch, sorry for the confusion. Is it good to merge? if not, what's missing?

@lemire
Copy link
Member

lemire commented Jul 12, 2023

@deadalnix I don't think that there was necessarily a problem with the PR... it did not seem necessary to sync. I do need to carefully check the changes to make sure that we don't introduce bugs. I expect to eventually merge this PR.

@tgross35
Copy link

tgross35 commented Jun 27, 2024

@lemire where do these values come from? I am trying to figure out a way to determine them programatically.

largest_power_of_ten seems to be floor(exponent_bias / log2(10)), which returns 38 for f32 and 308 for f64. This makes sense (can't represent an exponent larger than the max binary exponent) and does line up with the literals in this repo.

smallest_power_of_ten makes less sense. I think that -(exponent_bias + mantissa_bits)/log2(10) (possibly +/- an off by one error somewhere). Which seems to make sense because the smallest positive exponent should be the minimum exponent plus the additional places won by representing a subnormal, converted from base2 to base10. However, this gives values of -45 and -323; way off from -65 and -342 present in this repository.

My calculations do seem to align with http://weitz.de/ieee/ from a few spot checks, e.g. 1e-45 is representable as a f32 (0x1), but 1e-46 gets rounded to 0.


Edit: but there seems to be some kind of ripple effect in the algorithm, even though the math looks right at a surface level. Testing in the Rust stdlib, changing LARGEST_POWER_OF_TEN to -((Self::EXPONENT_BIAS + Self::MANTISSA_BITS) as f64) / LOG2_10) as i32 - 1 produces the constants -46 (f32) and -324 (f128), which seems like it would be correct based on the direct input->output relationship, but must have some side effect based on the actual output. Failures after changing:

  • f64 from "2.225073858507201136057409796709131975934819546351645648023426109724822222021076945516529523908135087914149158913039621106870086438694594645527657207407820621743379988141063267329253552286881372149012981122451451889849057222307285255133155755015914397476397983411801999323962548289017107081850690630666655994938275772572015763062690663332647565300009245888316433037779791869612049497390377829704905051080609940730262937128958950003583799967207254304360284078895771796150945516748243471030702609144621572289880258182545180325707018860872113128079512233426288368622321503775666622503982534335974568884423900265498198385487948292206894721689831099698365846814022854243330660339850886445804001034933970427567186443383770486037861622771738545623065874679014086723327636718749999999999999999999999999999999999999e-308". should be 0x0001999999999999 but rounds to 0. Note that -308 > -324 so it shouldn't be hitting the zero fast path, but is rounding to 0 for some reason.
  • f32 from "1.175494140627517859246175898662808184331245864732796240031385942718174675986064769972472277004271745681762695312500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000e-38". Should be 0x007FFFFE but rounds to 0. Again, -38 < -45 so it shouldn't round, but it hits a zero fast path somewhere

Changing LARGEST_POWER_OF_TEN to the expression I mentioned above seems to work okay, it is only changing the smallest that fails things.

My test branch of generic-ifying things https://github.com/tgross35/rust/blob/c0d27fa108a2215bdbcdb2fe1e5d89a62ded51ac/library/core/src/num/dec2flt/float.rs#L89-L100, setting the SMALLEST_POWER_OF_TENs back to the constants makes things work.

@lemire
Copy link
Member

lemire commented Jun 27, 2024

@tgross35

where do these values come from?

Sure. Please see the papers where this is discussed in details.

The Python script is part of this repository.

@tgross35
Copy link

The paper does say:

We have that w × 10−343 < 2−1074 for all w < 2^64.
Thus if the decimal exponent is smaller than -342, then the number must be considered to be zero. If the decimal
exponent is greater than 308, the result must be infinite (beyond the range).

But it doesn't mention the significance of these numbers, at least that I can find. 1074 is bias + mantissa bits and 64 seems to be the same constant for both f32 and f64. My guess is that is to account for possible shift within the internal u64 exponent representation.

Going off of that:

  • qmin,shifted = -⌊(exponent_bias + mantissa_bits + internal_rep_bits) / log2(10)⌋
  • qmax,shifted = ⌊(exponent_bias + 1) / log2(10)⌋

Which gives the following:

  • -26..4 for f16. Largest (4) matches std::float16_t/std::bfloat16_t support #148, but the smallest value there (-27) is one less than the calculated smallest (-26)
  • -59..38 for f16. Largest (38) matches std::float16_t/std::bfloat16_t support #148, but the smallest value there (-60) is one less than the calculated smallest (-59)
  • -64..38 for f32. Largest (38) matches what is in this repo. Smallest (-64) matches what is in this PR
  • -342..307 for f64. Largest (308) and smallest (-342) both match what is in this repo.

I have not tested against the f16 or bf16 algorithms. Does what is proposed here seem accurate?

(cc @jakubjelinek from #148, how did you come up with the values there? And is it possible that -26 and -59 could be the correct min powers of 10?)

@lemire
Copy link
Member

lemire commented Jun 27, 2024

We have that w × 10−343 < 2−1074 for all w < 2^64. Thus if the decimal exponent is smaller than -342, then the number must be considered to be zero. If the decimal
exponent is greater than 308, the result must be infinite (beyond the range).

But it doesn't mention the significance of these numbers, at least that I can find.

I am not sure what you mean by "doesn't mention the significance of these numbers".

Let me be explicit:

  • 10**309 is beyond the range of binary64 and is thus infinite. From the paper:
Screenshot 2024-06-27 at 5 02 51 PM
  • 2**-1074 is effectively zero in binary64. From the paper:
Screenshot 2024-06-27 at 5 01 30 PM

Does what is proposed here seem accurate?

Can you please explicitly relate your comments to @deadalnix 's PR.

@tgross35
Copy link

I do see the numbers in the paper, but I do not see a derevation or relationship with {largest,smallest}_power_of_ten, or a way to calculate what those numbers are based on properties. The bit you cite says that the smallest positive value for binary64 is 2-1074 = 2bias + mantissa_bits - 1, that part makes sense.

The problem: ceil(log10(2^-1074)) is ~-324, which should be the "smallest power of 10". However, smallest_power_of_ten in this repository, the python file, and the paper, is the magic number -342. Which I have not seen explained.

I am trying to explain mathematically why this is specifically -342, not -324 or -341 or -343 or anything else. "10e-341" rounds to 0, as does everything smaller until ~"10e-324".

So I am working backwards to try to understand this. As I quoted above from "Fast Number Parsing":

We have that w × 10−343 < 2−1074 for all w < 2^64.
Thus if the decimal exponent is smaller than -342, then the number must be considered to be zero. If the decimal
exponent is greater than 308, the result must be infinite (beyond the range).

This allows us to arrive at -⌊(exponent_bias + mantissa_bits + magic_number_64) / log2(10)⌋. Because magic_number_64 needs to be the same for both f32 and f64 (i.e. it is not bit width, plugging in "32" provides an inaccurate answer for f32), I am suggesting that magic_number_64 may be the width of the internal exponent repr (a uint64_t in

adjusted_mantissa compute_float(int64_t q, uint64_t w) noexcept {
)


The relationship to this PR is that it should be trivial to verify the suggested change if such a derevation of the magic number exists. My above formula can produce the number in this PR, saying the change should be correct; I am asking if my formula is correct.

The relation to everything else is aiding in applying this algorithm new float types that are not specified in the paper. E.g. if my above formula is indeed correct, then GCC may have an incorrect (but still functional) smallest_power_of_ten for f16 and bf16.

@lemire
Copy link
Member

lemire commented Jun 27, 2024

@tgross35 If you believe that there is a bug in GCC, please report the issue with GCC. At this time, fast_float does not yet support 16-bit floats. This pull request by @deadalnix is not related to 16-bit floats.

Your comments are misplaced. Please take them where they belong.

Let me be clear. This PR is not about 16-bit floats and we are not currently supporting 16-bit floats.

GCC does support 16-bit floats, but that is the wrong place to discuss issues with GCC. Please raise the issue where it belongs.

@lemire lemire merged commit c8b3ca0 into fastfloat:main Jun 27, 2024
@lemire
Copy link
Member

lemire commented Jun 27, 2024

@deadalnix Merged. Thanks.

@fastfloat fastfloat locked as resolved and limited conversation to collaborators Jun 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants