Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide the full set of IEEE Operations/Behaviors required for compliance #27204

tannergooding opened this issue Aug 23, 2018 · 18 comments
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Numerics


Copy link

tannergooding commented Aug 23, 2018


The IEEE 754:2008 and IEEE 754:2019 spec define a number of operations (some required, some recommended, and some optional) and the .NET Framework exposes a good number, but not all of them.

It would be ideal to ensure that all required operations are properly exposed and that they are compliant with the IEEE spec. Where a recommended or optional operation is also exposed, we should ensure that it is also compliant. Where a behavior/operation/feature is explicitly not implemented, we should ensure that it is properly documented with any reasoning as to why it is not supported.

For the purposes of this proposal, this document only covers the binary operations. The IEEE spec also defines a format and set of operations for decimal values, but those are not supported by .NET in any form today and should be covered separately.

Additionally, this general proposal is meant to track the overall status, and sub-proposals, where applicable should be opened to cover specific behaviors/operations. This proposal does not go in detail as to any special behaviors or handling of individual operations, which should be checked and validated separately.


Support for all formats and representations is language/implementation-defined. .NET currently supports binary32 and binary64. There have also been requests for binary16, binary128, and arbitrarily sized binary{k}.

The basic encodings defined are:

  • binary32 - System.Single
  • binary64 - System.Double
  • binary128

The extended encodings defined are:

  • binary16 - Proposed as System.Half
  • binary{k}, Where k >= 128 and is a multiple of 32


Several attributes that control the semantics of a "block" are given. For any exposed attribute, you must provide a means of statically setting the default and each support value. Providing a means to dynamically set the attributes is recommended.


  • Rounding Direction
    • Required: roundTiesToEven - Default, System.MidpointRounding.ToEven
    • Optional: roundTiesToAway - System.MidpointRounding.AwayFromZero
    • Required: roundTowardPositive - Not Supported
    • Required: roundTowardNegative - Not Supported
    • Required: roundTowardZero - Not Supported


  • Alternate Exception Handling - Not Supported (currently out of scope)


  • Preferred Width - Not Supported
  • Value Changing Optimizations - Not Supported
  • Reproducibility - Not Supported

If setting the attributes dynamically is supported, the following are required

  • binaryRoundingDirection getBinaryRoundingDirection() - Not Supported
  • void setBinaryRoundingDirection(binaryRoundingDirection) - Not Supported
  • modeGroup saveModes() - Not Supported
  • void restoreModes(modeGroup) - Not Supported
  • void defaultModes() - Not Supported

Required Operations

All operations listed below are considered required

General (T is a floating-point format)

  • T roundToIntegralTiesToEven(T) - Math.Round, Math.Round w/ MidpointRounding.ToEven
  • T roundToIntegralTiesToAway(T) - Math.Round w/ MidpointRounding.AwayFromZero
  • T roundToIntegralTowardZero(T) - Math.Round w/ MidpointRounding.ToZero
  • T roundToIntegralTowardPositive(T) - Math.Round w/ MidpointRounding.ToPositiveInfinity
  • T roundToIntegralTowardNegative(T) - Math.Round w/ MidpointRounding.ToNegativeInfinity
  • T roundToIntegralExact(T) - Not supported, meant to use the current Rounding Direction attribute
  • T nextUp(T) - Math.BitIncrement
  • T nextDown(T) - Math.BitDecrement
  • T remainder(T, T) - Math.IEEERemainder

logBFormat operations (T is a floating-point format, U can be an integral format or the same type as T)

  • T scaleB(T, U) - Math.ScaleB
  • U logB(T) - Math.LogB


  • T addition(T x, T y) - x + y
  • T subtraction(T x, T y) - x - y
  • T multiplication(T x, T y) - x * y
  • T division(T x, T y) - x / y
  • T squareRoot(T) - Math.Sqrt
  • T fusedMultiplyAdd(T, T, T) - Math.FusedMultiplyAdd
  • T convertFromInt(integerFormat x) - (T)x
  • integerFormat convertToIntegerTiesToEven(T) - Not Supported
  • integerFormat convertToIntegerTowardZero(T x) - (integerFormat)x, Not supported as an explicit operation
  • integerFormat convertToIntegerTowardPositive(T) - Not Supported
  • integerFormat convertToIntegerTowardNegative(T) - Not Supported
  • integerFormat convertToIntegerTiesToAway(T) - Not Supported
  • integerFormat convertToIntegerExactTiesToEven(T) - Not supported, Requires floating-point exception handling
  • integerFormat convertToIntegerExactTowardZero(T) - Not Supported, Requires floating-point exception handling
  • integerFormat convertToIntegerExactTowardPositive(T) - Not Supported, Requires floating-point exception handling
  • integerFormat convertToIntegerExactTowardNegative(T) - Not Supported, Requires floating-point exception handling
  • integerFormat convertToIntegerExactTiesToAway(T) - Not Supported, Requires floating-point exception handling


  • U convertFormat(T x) - (U)x where U is another floating-point format
  • T convertFromDecimalCharacter(decimalCharacterSequence) - Double.Parse
  • decimalCharacterSequence convertToDecimalCharacter(T, conversionSpecification) - Double.ToString, Double.ToString w/ format
  • T convertFromHexCharacter(hexCharacterSequence) - Not Supported
  • hexCharacterSequence convertToHexCharacter(T, conversionSpecification) - Not Supported

Sign Bit

  • T copy(T x) - var y = x
  • T negate(T x) - -x
  • T abs(T) - Math.Abs
  • T copySign(T, T) - Math.CopySign


  • boolean compareQuietEqual(T x, T y) - x == y
  • boolean compareQuietNotEqual(T x, T y) - x != y
  • boolean compareSignalingEqual(T x, T y) - Not Supported, Requires floating-point exception handling
  • boolean compareSignalingGreater(T x, T y) - Not Supported, Requires floating-point exception handling
  • boolean compareSignalingGreaterEqual(T x, T y) - Not Supported, Requires floating-point exception handling
  • boolean compareSignalingLess(T x, T y) - Not Supported, Requires floating-point exception handling
  • boolean compareSignalingLessEqual(T x, T y) - Not Supported, Requires floating-point exception handling
  • boolean compareSignalingNotEqual(T x, T y) - Not Supported, Requires floating-point exception handling
  • boolean compareSignalingNotGreater(T x, T y) - Not Supported, Requires floating-point exception handling
  • boolean compareSignalingLessUnordered(T x, T y) - Not Supported, Requires floating-point exception handling
  • boolean compareSignalingNotLess(T x, T y) - Not Supported, Requires floating-point exception handling
  • boolean compareSignalingGreaterUnordered(T x, T y) - Not Supported, Requires floating-point exception handling
  • boolean compareQuietGreater(T x, T y) - x > y
  • boolean compareQuietGreaterEqual(T x, T y) - x >= y
  • boolean compareQuietLess(T x, T y) - x < y
  • boolean compareQuietLessEqual(T x, T y) - x <= y
  • boolean compareQuietUnordered(T x, T y) - Not Supported
  • boolean compareQuietNotGreater(T x, T y) - Not Supported
  • boolean compareQuietLessUnordered(T x, T y) - Not Supported
  • boolean compareQuietNotLess(T x, T y) - Not Supported
  • boolean compareQuietGreaterUnordered(T x, T y) - Not Supported
  • boolean compareQuietOrdered(T x, T y) - Not Supported


  • boolean is754version1985() - Not Supported
  • boolean is754version2008() - Not Supported
  • boolean is754version2019() - Not Supported


  • enum class(T) - Not Supported
    • signalingNaN
    • quietNaN
    • negativeInfinity
    • negativeNormal
    • negativeSubnormal
    • negativeZero
    • positiveZero
    • positiveSubmormal
    • positiveNormal
    • positiveInfinity
  • boolean isSignMinus(T) - Double.IsNegative
  • boolean isNormal(T) - Double.IsNormal
  • boolean isFinite(T) - Double.IsFinite
  • boolean isZero(T) - INumberBase.IsZero, but not directly exposed
  • boolean isSubnormal(T) - Double.IsSubnormal
  • boolean isInfinite(T) - Double.IsInfinity
  • boolean isNaN(T) - Double.IsNaN
  • boolean isSignaling(T) - Not Supported
  • boolean isCanonical(T) - INumberBase.IsCanonical, but not directly exposed
  • enum radix(T) - INumberBase.Radix, but not directly exposed
    • 2 for binary
  • boolean totalOrder(T, T) - Not Supported
  • boolean totalOrderMag(T, T) - Not Supported


  • void lowerFlags(exceptionGroup) - Not Supported
  • void raiseFlags(exceptionGroup) - Not Supported
  • boolean testFlags(exceptionGroup) - Not Supported
  • boolean testSavedFlags(flags, exceptionGroup) - Not Supported
  • void restoreFlags(flags, exceptionGroup) - Not Supported
  • flags saveAllFlags() - Not Supported


Exceptions are not currently supported by the CoreCLR and no mechanism to control them is currently made available. I have considerd them "out of scope" for the purposes of this document.

Recommended Operations

Additional Mathematical Operations

  • T exp(T) - Double.Exp
  • T expm1(T) - Double.ExpM1
  • T exp2(T) - Double.Exp2
  • T exp2m1(T) - Double.Exp2M1
  • T exp10(T)- Double.Exp10
  • T exp10m1(T)- Double.Exp10M1
  • log - Math.Log
  • log2 - Math.Log2
  • log10 - Math.Log10
  • T logp1(T) - Double.LogP1
  • T log2p1(T) - Double.Log2P1
  • T log10p1(T) - Double.Log10P1
  • T hypot(T, T) - Double.Hypot
  • T rSqrt(T) - Not Supported
  • T compound(T, T) - Approved, not implemented yet
  • T rootn(T, T) - Double.RootN
  • T pown(T, T) - Not Supported
  • T pow(T, T) - Math.Pow
  • T powr(T, T) - Not Supported
  • T sin(T) - Math.Sin
  • T cos(T) - Math.Cos
  • T tan(T) - Math.Tan
  • T sinPi(T) - Double.SinPi
  • T cosPi(T) - Double.CosPi
  • T tanPi(T) - Double.TanPi
  • T asin(T) - Math.Asin
  • T acos(T) - Math.Acos
  • T atan(T) - Math.Atan
  • T atan2(T, T) - Math.Atan2
  • T asinPi(T) - Double.ASinPi
  • T acosPi(T) - Double.ACosPi
  • T atanPi(T) - Double.ATanPi
  • T atan2Pi(T, T) - Double,ATan2Pi
  • T sinh(T) - Math.Sinh
  • T cosh(T) - Math.Cosh
  • T tanh(T) - Math.Tanh
  • T asinh(T) - Math.Asinh
  • T acosh(T) - Math.Acosh
  • T atanh(T) - Math.Atanh


  • T sum(T[]) - Not Supported
  • T dot(T[], T[]) - Not Supported
  • T sumSquare(T[]) - Not Supported
  • T sumAbs(T[]) - Not Supported
  • T[] scaledProd(T[]) - Not Supported
  • T[] scaledProdSum(T[], T[]) - Not Supported
  • T[] scaledProdDiff(T[], T[]) - Not Supported

Augmented Arithmetic

  • (T, T) augmentedAddition(T, T)
  • (T, T) augmentedSubtraction(T, T)
  • (T, T) augmentedMultiplication(T, T)

Minimum and Maximum

  • T minimum(T, T) - Math.Min
  • T minimumNumber(T, T) - Double.MinNumber
  • T maximum(T, T) - Math.Max
  • T maximumNumber(T, T) - Double.MaxNumber
  • T minimumMagnitude(T x, T y) - Math.MinMagnitude
  • T minimumMagnitudeNumber(T x, T y) - Double.MinMagnitudeNumber
  • T maximumMagnitude(T x, T y) - Math.MaxMagnitude
  • T maximumMagnitudeNumber(T x, T y) - Double.MaxMagnitudeNumber


  • T getPayload(T) - Not Supported
  • T setPayload(T) - Not Supported
  • T setPayloadSignaling(T) - Not Supported
Copy link
Member Author

CC. @danmosemsft, @eerhardt

I went through the IEEE 754:2008 manual and attempted to track all required/recommended operations above. We should probably open individual issues (where one does not already exist) to track any sub behaviors/features/operations that are not currently exposed.

Also CC. @CarolEidt, who may be interested in some of this.

Copy link
Member Author

It should be noted that, while I made a good effort to ensure the table is accurate, I may have missed bits (either operations from the spec or missed something already supported by .NET).

Copy link
Member Author

Logged to track the missing rounding directions.

Copy link
Member Author

Logged to track several of the missing "required" operations.

Copy link
Member Author

  • The various convertToInteger with explicit rounding specifiers should likely be proposed together.
  • The convert to/from hex-character sequence operations should be proposed together
  • The various compareSignaling operations, if supported, should be exposed together
    • I would think if exceptions are ever supported, explicitly doing so via special operations might be a good way to do it
  • The is754Version operations should likely be proposed together
    • maybe exposed on a FloatingPointEnv class, which could also contain some of the other related operations
  • The class (classify) and related operations (isZero, `isSignalling, etc) should be proposed together
  • The various "recommended" Math/MathF operations (such as expm1) should be proposed together

Copy link
Member Author

tannergooding commented Aug 23, 2018

It is also worth noting that for many of these operations, doing a two-step "equivalent" operation is not always equivalent. Since individual operations are documented to compute to the "correct infinitely precise" result, and then round using the current (or explicitly specified) rounding direction.

As an example, Math.Floor, while doing (effectively) a "convertToInteger" operation, is not the same since it returns a floating-point result, and may lose precision as compared to a result returned precisely as an integer.

Copy link

@tannergooding - thanks for the great detail on this!
I would love to see input from the community on which capabilities would be of most value.

Copy link

@tannergooding if we are making a list of discrepancies with IEEE754, we should note that modulo of floating point operands is not compliant hence the existence of alternative Math[F].IEEERemainder methods.

Copy link
Member Author

Updated above with some of the things that have been done since the issue was created.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Mar 4, 2020
Copy link
Member Author

I've updated this to include the changes for IEEE 754:2019 from #1387

Copy link

Worth adding links where issues exist? eg to #936

Copy link
Member Author

Yes, definitely. I will give this another pass tomorrow.

@tannergooding tannergooding added the untriaged New issue has not been triaged by the area owner label Mar 6, 2020
@tannergooding tannergooding modified the milestones: 5.0.0, Future Jun 23, 2020
@ericstj ericstj removed the untriaged New issue has not been triaged by the area owner label Jun 25, 2020
@krwq krwq added the untriaged New issue has not been triaged by the area owner label Feb 4, 2021
@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Feb 4, 2021
Copy link

abelbraaksma commented Jul 23, 2022

I would love to see input from the community on which capabilities would be of most value.

About that, I think not many people are very aware of what IEEE 754 (and versions) is, let alone the implication, or what they are missing out on ;). People also have learned that working with floats is kinda dangerous, esp. w.r.t. comparisons and equality and/or in relation to (non-signaling) NaN. Adding (most if not all) of these functionalities might help.

For instance, being able to trap signaling NaN some way may benefit programmers (catch min/max or comparison with NaN, or now-silent things like zero-div-by-zero). Not sure how that could be easily implemented though: a different type or through the use of a set of signaling static functions?

Another source for concern in the current implementation of min/max and sorting is that different parts of .NET use different approaches. I.e., LINQ returns different results depending on whether Min or Max is used (Sharplab link) but Math.Min./Max are both consistent: return NaN when either or both is NaN.

Fixing LINQ may be considered a backward compat issue, but an alternative is to implemented totalOrder (preferably under a different name). I'd even prefer it if that approach would be adopted in the IComparable implementation. This would put negative NaN (the default) before everything else, and positive NaN after everything else. Currently, CompareTo does not have total order w.r.t. the binary representation of NaN:

nan.CompareTo(Math.CopySign(nan, 1.0))   // yields 0
nan.CompareTo(Math.CopySign(nan, -1.0))   // yields 0
(2.0).CompareTo(Math.CopySign(nan, 1.0))   // yields 1
(2.0).CompareTo(Math.CopySign(nan, -1.0))   // yields 1

I'd assume it behaves the same if I set the signaling bit or anything else of the free bits in NaN, yet totalOrder would allow a simple way to deal with this without having to use bit manipulation magic.

void setBinaryRoundingDirection(binaryRoundingDirection) - Not Supported

Being able to set the default binding direction for a thread/process or assembly would be of great use for a larger group of users, as often I notice that in real-world code, not all people on the team know of the preferred rounding in a code-base and deviate from the course, leading to issues downstream.

Required: roundTowardPositive - Not Supported
Required: roundTowardNegative - Not Supported
Required: roundTowardZero - Not Supported

For a long time I've personally considered these big misses, as it is often required to round towards zero, or round towards an infinity. Java has had these for a very long time and people often ask about this.

EDIT: I think this issue is a little behind on updates (@tannergooding, what did we miss? ;) ), apparently, quite recently, MidPointRounding.ToNegativeInfinity/ToPositiveInfinity/ToZero have been added (for Core 3.0, I see now, but never in Framework). Great to know and thanks! Not sure if I missed other things in this area ;).

Also many thanks for adding things like CopySign, MaxMagnitude, ReciprocalEstimate, FusedMultiplyAdd, SinCos, ScaleB and many others. A lot of these were very hard to code properly without direct support.

I hope that adding the other mathematical functions in the list are going to make it in .NET 7.0 or soon after. Here's hoping!

Copy link

Just noticed this excellent historical summary by @tannergooding , showing the evolution of the IEEE functions (and quite a bit more). Pity it’s so buried, it’s an excellent post, worthy of a blog! dotnet/csharplang#2585 (comment)

Copy link

Updated this tracking issue with some features we added via Generic Math, should be more or less up to date now.

Copy link

Should not implement it by now. float128 is not natively hardware-supported, c or c++ in GCC uses software calculatte instead. Considering of the instruction cycle, There is a lot of performance issues.

This asm code is c++ generated with -O3 optmization .

movdqa xmm1,XMMWORD PTR [rsp+0x10]
 movdqa xmm0,XMMWORD PTR [rsp]
 call   d3 <main+0xd3>
    R_X86_64_PLT32 __addtf3-0x4

When this is how GCC add up two double value.

 movsd  xmm0,QWORD PTR [rsp+0x50]
 addsd  xmm0,QWORD PTR [rsp+0x58]

Copy link
Member Author

Not all types are designed for performance, many are designed for handling specific edge cases or other needs and are balanced for usability instead.

Copy link
Member Author

Closing this. We've exposed the full set of required operations now and many optional ones on top.

Additional things can be done via opening individual proposals where appropriate.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 3, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Numerics
No open projects

No branches or pull requests