Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement all System.Math functions for single precision floating point values in addition to double precision #312

Closed
steffalk opened this issue Apr 26, 2018 · 11 comments · Fixed by nanoframework/nf-interpreter#748

Comments

@steffalk
Copy link
Member

Details:

System.Math functions currently only available for double precision values (such as sqrt, trigonometric functions, logarithmic and exponential functions, Min() and Max()) should be implemented also for single-precision values.

Motivation:

Some CPUs (such as the STM type on Netduino 3 boards) have a floating-point unit capable of computations on single precision values, but not double precision. We cannot make use of those FPUs if System.Math does not offer single precision overloads of its functions and thus have to use double-precision values and computations, with less performance. This may be important for applications where single precision would be sufficient, and we have to perform many computations or have to perform then in IRQ event handlers.

nanoFramework area: Hardware/target board | Nuget packages | Community targets

Detailed repro steps so we can see the same problem

See the object browser window in Visual Studio and navigate to mscorlib.System.Math. There are many functions offered for double, but not for single precision values.

@josesimoes josesimoes added Type: Feature request Area: Interpreter FOR DISCUSSION Open for discussion. Contributes from the community are welcome/expected. Area: CL-Core-Library labels Apr 26, 2018
@josesimoes josesimoes added this to the Backlog milestone Apr 26, 2018
@josesimoes
Copy link
Member

josesimoes commented Jun 13, 2018

This is a high impact change with relevant benefits on the image size and positive medium impact on execution performance.

Background

According to STM32 AN4044 we have FPU single precision on most F's and L's series and double precision on some F7's and H7's.
So, wherever we are using double types we are forcing the variable to be 64bits and any arithmetic operation to be carried by a software implementation.
If single-precision operations were carried the generated code would use instructions that use the FPU directly.

The latter is valid for ESP32 targets too, as it features an SP FPU.

The .NET Math API uses doubles for all their arguments.

Working hypothesis

  • Calling the Math API using floats as the arguments doesn't posse any issue as both types are interchangeable (from compiling perspective, that is).
    This means that we would not need to provide a new Math API with floats instead of doubles, nor offer overloaded methods with float arguments and most certainly would not have to change any existing code.

  • The newlib nano (that is being currently used) offers SP and DP variants along with soft and hard versions. Which one is being pulled into the build depends strictly on the arguments that are passed to the math functions being called.

  • We should expose both options as a CMake option leaving it up to the target architect to decide which one to use, as there could be use cases where one can be preferred over the other. Anyways the default should be the one that is most advantageous for the image size and that makes a more efficient use of the available resources. This would be using the float implementation with hardware SP when available.

Discussion

  • Implementing these changes takes 15 minutes total. Thus is negligible.

  • For a STM32F429I-DISCO image, build in debug flavor, the image size using SP (floats) is 201518 bytes vs. 210222 bytes for the double version. That's a 5% decrease. More in release flavour.

  • I did some preliminary testing with heavy floating-point calculations. And the results are: ~1:31.839 operation time with doubles vs. ~1:31.216 using floats (that's mm:ss.milli).

Analysis

  • Pros: implementing this is beneficial from all prespectives: image size and performance.
  • Cons: none that is evident.

@ghost
Copy link

ghost commented Jun 13, 2018

Why so tight difference between float and double test ?! Use of the hardware FPU should show a (much) more important gap, to me.

@josesimoes
Copy link
Member

I probably haven't explain myself with enough clarity. All the numbers above are using FPU. The differences in discussion are with using single or double precision.
Single is hardware native, double requires extra processing in software to make them computable using the FPU.

@ghost
Copy link

ghost commented Jun 13, 2018

Then the same test without FPU enabled would have been useful, although I already know the result (it will be worse).
But it could show the magnitude of the gain in using FPU. The question about double/single is closed, to me, as there are not real differences. Unless a specific program is using really heavy calculations, in which case it would try to optimize everything possible anyway.

@josesimoes
Copy link
Member

josesimoes commented Jun 13, 2018

The discussion is around sp and dp. No doubts about FPU...

On your comment above, aren't you overlooking the image size gain?
Having a significant decrease in image size along with performance improvements (granted it's minor) without any functionality penalty isn't something to neglect. Is it?!

@ghost
Copy link

ghost commented Jun 13, 2018

No, I did not neglect that gain in size. I was just trying to see if there were any "cons" that would void this gain.
If I were crude (which I am, in fact), I would say that the question does not need to be posted. The gains are way above the potential losses.

@josesimoes
Copy link
Member

Oh! 😄 I didn't understood that you were in favor. That's why I insisted. Apologies.

@steffalk
Copy link
Member Author

I'm not sure if I understand José's suggestion fully: Do you suggest applications should still use double-typed variables and math function calls, and have the function convert them to single precision, pass to the FPU, and convert the result back from single to double? If so, wouldn't this impose an (unexpected, from the application's point of view) precision loss? Also, would converting from and to single/double not add a performance penalty which would not exist if we use single precision variables and function calls in the first place?

@josesimoes
Copy link
Member

josesimoes commented Jun 14, 2018

A couple of clarifications:

  • double and float C# types are interchangeable because the CLR has an implicit conversion between them. So when you call Math.Sqrt(value) it doesn't matter if value is a double or a float. The compiler is happy with that and so is the CLR.

  • Considering the above, there is no penalty on some potential "cast" happening in the background. That is not happening.

  • As for the use of the FPU, on CPUs where there is one and it has SP, when you call a floating point operation with a DP value (double like it is now) what happens is that, depending on what operations are available on that core, there could be an extra processing to perform that operation on the 64bits value using the 32 bits FPU and then back. So for most STM32's that have SP FPU this is what is happening now. In the "eagerness" of being perfect and precise using the double math API we are imposing that penalty on ourselves without any gain. Unless one is coding a managed app that requires 64bit floating point precision for it's calculation. (I would say that this is the exception, not the usual use case 😉 ).

  • One could argue that the developer could be mislead by not providing a Math API accepting floats instead of doubles thus potentially leading them to rely on DP operations when what is happening in the background are SP operations. And I agree with this argument, but... the constrained platform reality comes in! 😲

Alternatives to tackle the latter:

  1. Add a remark on the Math API methods mentioning that the calculations are performed in SP or DP depending on the target platform that is running the app and urging the developer to check the image options to check that.

  2. Add float overloads to the Math API and then implement the operation either SP or DP. This would still require a note to the developer to let them know if the image was built to provide SP or DP operations.

  3. Provide an alternative mscorlib with only float Math API.

I would prefer 1. because of the platform constrains and because it's more efficient on all aspects (class library, native, usability).
Option 2. wastes more flash because you end up on about the same as 1. but you have a larger library with duplicated Math API offering calls for both types.
Finally 3. looks more confusing and cumbersome not to mention that it requires more time and effort to maintain, plus it would require providing SP and DP versions for all the class libraries.

@steffalk
Copy link
Member Author

As it seems, I have understood your suggestion now ;-) And then, I would argue against it for the following reasons:

a) Offering Math.Sqrt(double) (for example), but calculating only with single precision, is a lie, and could lead customers into strange bugs which they don't understand, if they happen to actually need double precision. We can assume that this will be seldom, but we don't know it.

b) If they discover that sqrt(double) computes only SP, they would blame the framework.

c) I can't believe that there is no (even small) processing overhead when passing a double into a single parameter. The structs should have different sizes and structures (6 vs. 8 bytes, I guess), so even if C# implicitly converts the types, it has to do so by inserting some instructions to actually do it. This will run on the CPU, be it fast or not.

So, I strongly suggest to be clear and honest here. I could imagine several ways:

  1. Either, have the method signatures go like sqrt(single) and officially declare that we don't support DP overloads in System.Math. Then we may well use the FPU, and everybody knows what she will get.

  2. Add SP overloads to the functions and let them use the FPU, if there is one. Clearly document when an FPU will be used and when not, so that dummies like me get an idea about what performance we can expect.

If the memory savings using only SP and the SP FPU is so charming, couldn't there be an optional nuget package containing the DP overloads, software-computed if needed, that we can use if we need DP? This could be a way if the software emulation does not have to be on or off during the build of the firmware.

Anyway, I would say: If I need DP, and there are overloads taking DP parameters and returns, then I should get DP and not SP. If we only support SP, then let the method signatures reflect that by clearly taking SP parameters and return values.

@josesimoes
Copy link
Member

josesimoes commented Jun 15, 2018

@steffalk I understand your concerns and point of view with providing a clear and honest implementation of the API. We sure don't won't to look like we are hiding obscure details and trying to be "smart" about this! 😄

There is an extra IL instruction to do the implicit conversion from float to double. That's negligible but exists. The other way around has to have a cast.

As for providing mscorlib with either float or double Math, just like I've pointed above, it would require also to duplicate ALL the other class libraries (because they reference it). That's something that I would rather stay away of for the obvious reasons! 😓

Trying to wrap this up:

  1. Seems that we are better with provinding overloaded methods for System.Math API. This takes care of the managed end.
  2. At the firmware level there would be options to use FPU if available and to use the SP or DP option if and when available. The math API calls that are not implemented will throw a not implemented exception.
  3. We should expose this information through the DeviceInformation structure.
  4. We should have the DeviceInformation available in a class library (nanoFramework.Runtime.Native possibly) so a managed app can have all the details about the platform that is running beneath. This seems quite obvious and it makes sense to me providing it, no matter if this one goes forward or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants