Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc][math] Optimize generic nearest integer functions #98483

Merged
merged 1 commit into from
Jul 11, 2024

Conversation

overmighty
Copy link
Member

No description provided.

@overmighty
Copy link
Member Author

overmighty commented Jul 11, 2024

Based on #98472.

cc @lntue

@overmighty
Copy link
Member Author

Before:

  • Intel Core i7-13700H, Clang 18
    • ceilf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 70750929 ns 
           Average runtime : 2.49237 ns/op 
           Ops per second  : 401224978 op/s 
      -- Other function --
           Total time      : 38140369 ns 
           Average runtime : 1.34358 ns/op 
           Ops per second  : 744278064 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.85501 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 266044931 ns
      Average runtime : 1.63639 ns/op
      Ops per second : 611100987 op/s
      -- Other function --
      Total time : 222220790 ns
      Average runtime : 1.36684 ns/op
      Ops per second : 731616155 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.19721

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 193127438 ns
      Average runtime : 1.41202 ns/op
      Ops per second : 708205532 op/s
      -- Other function --
      Total time : 189577459 ns
      Average runtime : 1.38606 ns/op
      Ops per second : 721467207 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.01873

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1618317865 ns
      Average runtime : 2.41148 ns/op
      Ops per second : 414682785 op/s
      -- Other function --
      Total time : 898882410 ns
      Average runtime : 1.33944 ns/op
      Ops per second : 746581034 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.80037

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 533237290 ns
      Average runtime : 1.58917 ns/op
      Ops per second : 629258842 op/s
      -- Other function --
      Total time : 500883249 ns
      Average runtime : 1.49275 ns/op
      Ops per second : 669905173 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.06459

    • ceilf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 674273422 ns 
           Average runtime : 8.32436 ns/op 
           Ops per second  : 120129308 op/s 
      -- Other function --
           Total time      : 92629274 ns 
           Average runtime : 1.14357 ns/op 
           Ops per second  : 874453577 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 7.27927 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 479496514 ns
      Average runtime : 3.80553 ns/op
      Ops per second : 262775633 op/s
      -- Other function --
      Total time : 140149736 ns
      Average runtime : 1.1123 ns/op
      Ops per second : 899038439 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.42132

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 92792338 ns
      Average runtime : 1.47289 ns/op
      Ops per second : 678935366 op/s
      -- Other function --
      Total time : 73640687 ns
      Average runtime : 1.1689 ns/op
      Ops per second : 855505326 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.26007

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 680070839 ns
      Average runtime : 16.6195 ns/op
      Ops per second : 60170202 op/s
      -- Other function --
      Total time : 43574324 ns
      Average runtime : 1.06487 ns/op
      Ops per second : 939085136 op/s
      -- Average runtime ratio --
      Mine / Other's : 15.6071

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 77634082 ns
      Average runtime : 3.79443 ns/op
      Ops per second : 263544044 op/s
      -- Other function --
      Total time : 21819130 ns
      Average runtime : 1.06643 ns/op
      Ops per second : 937709248 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.55807

    • roundf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 93092765 ns 
           Average runtime : 3.27941 ns/op 
           Ops per second  : 304932826 op/s 
      -- Other function --
           Total time      : 59353335 ns 
           Average runtime : 2.09086 ns/op 
           Ops per second  : 478272029 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.56845 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 314608122 ns
      Average runtime : 1.93509 ns/op
      Ops per second : 516770892 op/s
      -- Other function --
      Total time : 308309417 ns
      Average runtime : 1.89635 ns/op
      Ops per second : 527328427 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.02043

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 190342819 ns
      Average runtime : 1.39166 ns/op
      Ops per second : 718566220 op/s
      -- Other function --
      Total time : 297937324 ns
      Average runtime : 2.17832 ns/op
      Ops per second : 459069438 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.638869

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1984734676 ns
      Average runtime : 2.95749 ns/op
      Ops per second : 338125074 op/s
      -- Other function --
      Total time : 1086674803 ns
      Average runtime : 1.61927 ns/op
      Ops per second : 617561535 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.82643

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 627913946 ns
      Average runtime : 1.87133 ns/op
      Ops per second : 534379403 op/s
      -- Other function --
      Total time : 618657053 ns
      Average runtime : 1.84374 ns/op
      Ops per second : 542375260 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.01496

    • roundf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 654322315 ns 
           Average runtime : 8.07805 ns/op 
           Ops per second  : 123792201 op/s 
      -- Other function --
           Total time      : 92611262 ns 
           Average runtime : 1.14335 ns/op 
           Ops per second  : 874623649 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 7.06526 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 322778678 ns
      Average runtime : 2.56174 ns/op
      Ops per second : 390360357 op/s
      -- Other function --
      Total time : 140106192 ns
      Average runtime : 1.11195 ns/op
      Ops per second : 899317854 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.30381

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 106860024 ns
      Average runtime : 1.69619 ns/op
      Ops per second : 589556296 op/s
      -- Other function --
      Total time : 73616598 ns
      Average runtime : 1.16852 ns/op
      Ops per second : 855785267 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.45158

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 517069372 ns
      Average runtime : 12.6361 ns/op
      Ops per second : 79138317 op/s
      -- Other function --
      Total time : 43561175 ns
      Average runtime : 1.06454 ns/op
      Ops per second : 939368600 op/s
      -- Average runtime ratio --
      Mine / Other's : 11.87

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 52330339 ns
      Average runtime : 2.55769 ns/op
      Ops per second : 390977784 op/s
      -- Other function --
      Total time : 21780612 ns
      Average runtime : 1.06455 ns/op
      Ops per second : 939367543 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.40261

    • roundevenf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 75614794 ns 
           Average runtime : 2.66371 ns/op 
           Ops per second  : 375416482 op/s 
      -- Other function --
           Total time      : 38131510 ns 
           Average runtime : 1.34327 ns/op 
           Ops per second  : 744450980 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.983 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 269643033 ns
      Average runtime : 1.65852 ns/op
      Ops per second : 602946488 op/s
      -- Other function --
      Total time : 231081183 ns
      Average runtime : 1.42134 ns/op
      Ops per second : 703563647 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.16688

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 226359312 ns
      Average runtime : 1.65499 ns/op
      Ops per second : 604233679 op/s
      -- Other function --
      Total time : 195512952 ns
      Average runtime : 1.42946 ns/op
      Ops per second : 699564497 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.15777

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1985345483 ns
      Average runtime : 2.9584 ns/op
      Ops per second : 338021047 op/s
      -- Other function --
      Total time : 887333509 ns
      Average runtime : 1.32223 ns/op
      Ops per second : 756298002 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.23743

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 543037247 ns
      Average runtime : 1.61838 ns/op
      Ops per second : 617902882 op/s
      -- Other function --
      Total time : 455375583 ns
      Average runtime : 1.35713 ns/op
      Ops per second : 736851716 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.1925

    • roundevenf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 579775585 ns 
           Average runtime : 7.15772 ns/op 
           Ops per second  : 139709229 op/s 
      -- Other function --
           Total time      : 92620845 ns 
           Average runtime : 1.14347 ns/op 
           Ops per second  : 874533157 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 6.25967 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 462531174 ns
      Average runtime : 3.67088 ns/op
      Ops per second : 272414070 op/s
      -- Other function --
      Total time : 140118694 ns
      Average runtime : 1.11205 ns/op
      Ops per second : 899237613 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.301

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 109735077 ns
      Average runtime : 1.74183 ns/op
      Ops per second : 574109953 op/s
      -- Other function --
      Total time : 73623625 ns
      Average runtime : 1.16863 ns/op
      Ops per second : 855703586 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.49049

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 511212017 ns
      Average runtime : 12.493 ns/op
      Ops per second : 80045066 op/s
      -- Other function --
      Total time : 43560860 ns
      Average runtime : 1.06454 ns/op
      Ops per second : 939375393 op/s
      -- Average runtime ratio --
      Mine / Other's : 11.7356

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 73806173 ns
      Average runtime : 3.60734 ns/op
      Ops per second : 277212584 op/s
      -- Other function --
      Total time : 21792744 ns
      Average runtime : 1.06514 ns/op
      Ops per second : 938844598 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.38673

  • Intel Core i7-13700H, Clang 18, -march=native
    • ceilf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 70562193 ns 
           Average runtime : 2.48572 ns/op 
           Ops per second  : 402298154 op/s 
      -- Other function --
           Total time      : 30653028 ns 
           Average runtime : 1.07982 ns/op 
           Ops per second  : 926076210 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.30196 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 233485662 ns
      Average runtime : 1.43612 ns/op
      Ops per second : 696318217 op/s
      -- Other function --
      Total time : 181146278 ns
      Average runtime : 1.1142 ns/op
      Ops per second : 897508476 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.28893

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 191688756 ns
      Average runtime : 1.4015 ns/op
      Ops per second : 713520828 op/s
      -- Other function --
      Total time : 151863004 ns
      Average runtime : 1.11032 ns/op
      Ops per second : 900640158 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.26225

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1314099125 ns
      Average runtime : 1.95816 ns/op
      Ops per second : 510683362 op/s
      -- Other function --
      Total time : 708380629 ns
      Average runtime : 1.05557 ns/op
      Ops per second : 947355888 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.85507

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 451824067 ns
      Average runtime : 1.34654 ns/op
      Ops per second : 742643662 op/s
      -- Other function --
      Total time : 354146792 ns
      Average runtime : 1.05544 ns/op
      Ops per second : 947472312 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.27581

    • ceilf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 400481387 ns 
           Average runtime : 4.94421 ns/op 
           Ops per second  : 202256590 op/s 
      -- Other function --
           Total time      : 92906358 ns 
           Average runtime : 1.14699 ns/op 
           Ops per second  : 871845606 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 4.31059 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 479878016 ns
      Average runtime : 3.80856 ns/op
      Ops per second : 262566726 op/s
      -- Other function --
      Total time : 140554735 ns
      Average runtime : 1.11551 ns/op
      Ops per second : 896447921 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.41417

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 123842741 ns
      Average runtime : 1.96576 ns/op
      Ops per second : 508709670 op/s
      -- Other function --
      Total time : 73836421 ns
      Average runtime : 1.17201 ns/op
      Ops per second : 853237455 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.67726

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 486842946 ns
      Average runtime : 11.8974 ns/op
      Ops per second : 84051746 op/s
      -- Other function --
      Total time : 43613644 ns
      Average runtime : 1.06583 ns/op
      Ops per second : 938238501 op/s
      -- Average runtime ratio --
      Mine / Other's : 11.1626

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 80535064 ns
      Average runtime : 3.93622 ns/op
      Ops per second : 254050831 op/s
      -- Other function --
      Total time : 21773902 ns
      Average runtime : 1.06422 ns/op
      Ops per second : 939657026 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.6987

    • roundf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 62573381 ns 
           Average runtime : 2.20429 ns/op 
           Ops per second  : 453659999 op/s 
      -- Other function --
           Total time      : 57189374 ns 
           Average runtime : 2.01463 ns/op 
           Ops per second  : 496369133 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.09414 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 269658978 ns
      Average runtime : 1.65862 ns/op
      Ops per second : 602910836 op/s
      -- Other function --
      Total time : 265856429 ns
      Average runtime : 1.63523 ns/op
      Ops per second : 611534280 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.0143

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 162013177 ns
      Average runtime : 1.18453 ns/op
      Ops per second : 844214788 op/s
      -- Other function --
      Total time : 285219834 ns
      Average runtime : 2.08534 ns/op
      Ops per second : 479538600 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.568029

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1777297523 ns
      Average runtime : 2.64838 ns/op
      Ops per second : 377589318 op/s
      -- Other function --
      Total time : 1086540688 ns
      Average runtime : 1.61907 ns/op
      Ops per second : 617637763 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.63574

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 540391408 ns
      Average runtime : 1.61049 ns/op
      Ops per second : 620928229 op/s
      -- Other function --
      Total time : 531246386 ns
      Average runtime : 1.58324 ns/op
      Ops per second : 631617059 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.01721

    • roundf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 218499719 ns 
           Average runtime : 2.69753 ns/op 
           Ops per second  : 370709858 op/s 
      -- Other function --
           Total time      : 92905217 ns 
           Average runtime : 1.14698 ns/op 
           Ops per second  : 871856313 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.35186 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 312089755 ns
      Average runtime : 2.4769 ns/op
      Ops per second : 403730010 op/s
      -- Other function --
      Total time : 140534067 ns
      Average runtime : 1.11535 ns/op
      Ops per second : 896579759 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.22074

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 125867891 ns
      Average runtime : 1.9979 ns/op
      Ops per second : 500524792 op/s
      -- Other function --
      Total time : 73878540 ns
      Average runtime : 1.17268 ns/op
      Ops per second : 852751015 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.70371

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 354263972 ns
      Average runtime : 8.65748 ns/op
      Ops per second : 115507088 op/s
      -- Other function --
      Total time : 43589009 ns
      Average runtime : 1.06523 ns/op
      Ops per second : 938768761 op/s
      -- Average runtime ratio --
      Mine / Other's : 8.12737

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 49744510 ns
      Average runtime : 2.43131 ns/op
      Ops per second : 411301669 op/s
      -- Other function --
      Total time : 21794194 ns
      Average runtime : 1.06521 ns/op
      Ops per second : 938782136 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.28247

    • roundevenf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 68886590 ns 
           Average runtime : 2.42669 ns/op 
           Ops per second  : 412083687 op/s 
      -- Other function --
           Total time      : 30644535 ns 
           Average runtime : 1.07953 ns/op 
           Ops per second  : 926332868 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.24792 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 254584291 ns
      Average runtime : 1.5659 ns/op
      Ops per second : 638610965 op/s
      -- Other function --
      Total time : 182953790 ns
      Average runtime : 1.12531 ns/op
      Ops per second : 888641443 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.39152

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 192144006 ns
      Average runtime : 1.40483 ns/op
      Ops per second : 711830271 op/s
      -- Other function --
      Total time : 155715252 ns
      Average runtime : 1.13849 ns/op
      Ops per second : 878359173 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.23394

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1548275403 ns
      Average runtime : 2.30711 ns/op
      Ops per second : 433442628 op/s
      -- Other function --
      Total time : 711953607 ns
      Average runtime : 1.06089 ns/op
      Ops per second : 942601531 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.17469

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 541407358 ns
      Average runtime : 1.61352 ns/op
      Ops per second : 619763058 op/s
      -- Other function --
      Total time : 356017062 ns
      Average runtime : 1.06101 ns/op
      Ops per second : 942494941 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.52073

    • roundevenf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 218604121 ns 
           Average runtime : 2.69882 ns/op 
           Ops per second  : 370532813 op/s 
      -- Other function --
           Total time      : 92908523 ns 
           Average runtime : 1.14702 ns/op 
           Ops per second  : 871825289 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.3529 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 466298885 ns
      Average runtime : 3.70078 ns/op
      Ops per second : 270212955 op/s
      -- Other function --
      Total time : 140496888 ns
      Average runtime : 1.11505 ns/op
      Ops per second : 896817017 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.31893

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 126352650 ns
      Average runtime : 2.0056 ns/op
      Ops per second : 498604500 op/s
      -- Other function --
      Total time : 73838118 ns
      Average runtime : 1.17203 ns/op
      Ops per second : 853217846 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.71121

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 357035703 ns
      Average runtime : 8.72521 ns/op
      Ops per second : 114610386 op/s
      -- Other function --
      Total time : 43579606 ns
      Average runtime : 1.065 ns/op
      Ops per second : 938971316 op/s
      -- Average runtime ratio --
      Mine / Other's : 8.19272

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 74230150 ns
      Average runtime : 3.62806 ns/op
      Ops per second : 275629242 op/s
      -- Other function --
      Total time : 21799282 ns
      Average runtime : 1.06546 ns/op
      Ops per second : 938563022 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.40516

  • Intel Core i7-13700H, GCC 14
    • ceilf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 79235665 ns 
           Average runtime : 2.79126 ns/op 
           Ops per second  : 358260891 op/s 
      -- Other function --
           Total time      : 30980260 ns 
           Average runtime : 1.09135 ns/op 
           Ops per second  : 916294440 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.55762 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 265892007 ns
      Average runtime : 1.63545 ns/op
      Ops per second : 611452453 op/s
      -- Other function --
      Total time : 178742810 ns
      Average runtime : 1.09941 ns/op
      Ops per second : 909576838 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.48757

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 192297481 ns
      Average runtime : 1.40595 ns/op
      Ops per second : 711262151 op/s
      -- Other function --
      Total time : 152622397 ns
      Average runtime : 1.11587 ns/op
      Ops per second : 896158903 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.25996

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1569986223 ns
      Average runtime : 2.33946 ns/op
      Ops per second : 427448693 op/s
      -- Other function --
      Total time : 708320848 ns
      Average runtime : 1.05548 ns/op
      Ops per second : 947435843 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.21649

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 449039558 ns
      Average runtime : 1.33824 ns/op
      Ops per second : 747248820 op/s
      -- Other function --
      Total time : 354143807 ns
      Average runtime : 1.05543 ns/op
      Ops per second : 947480298 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.26796

    • ceilf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 412820002 ns 
           Average runtime : 5.09654 ns/op 
           Ops per second  : 196211422 op/s 
      -- Other function --
           Total time      : 92642798 ns 
           Average runtime : 1.14374 ns/op 
           Ops per second  : 874325924 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 4.45604 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 270330514 ns
      Average runtime : 2.14548 ns/op
      Ops per second : 466096106 op/s
      -- Other function --
      Total time : 171002491 ns
      Average runtime : 1.35716 ns/op
      Ops per second : 736831371 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.58086

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 88478482 ns
      Average runtime : 1.40442 ns/op
      Ops per second : 712037532 op/s
      -- Other function --
      Total time : 73672354 ns
      Average runtime : 1.1694 ns/op
      Ops per second : 855137600 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.20097

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 410560254 ns
      Average runtime : 10.0332 ns/op
      Ops per second : 99668683 op/s
      -- Other function --
      Total time : 54234612 ns
      Average runtime : 1.32538 ns/op
      Ops per second : 754499727 op/s
      -- Average runtime ratio --
      Mine / Other's : 7.57008

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 43335152 ns
      Average runtime : 2.11804 ns/op
      Ops per second : 472134031 op/s
      -- Other function --
      Total time : 27109088 ns
      Average runtime : 1.32498 ns/op
      Ops per second : 754728451 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.59855

    • roundf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 77587274 ns 
           Average runtime : 2.73319 ns/op 
           Ops per second  : 365872372 op/s 
      -- Other function --
           Total time      : 55777674 ns 
           Average runtime : 1.9649 ns/op 
           Ops per second  : 508931942 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.39101 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 269861096 ns
      Average runtime : 1.65986 ns/op
      Ops per second : 602459274 op/s
      -- Other function --
      Total time : 265817093 ns
      Average runtime : 1.63499 ns/op
      Ops per second : 611624776 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.01521

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 191108888 ns
      Average runtime : 1.39726 ns/op
      Ops per second : 715685813 op/s
      -- Other function --
      Total time : 293324330 ns
      Average runtime : 2.14459 ns/op
      Ops per second : 466289039 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.651528

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1716754585 ns
      Average runtime : 2.55816 ns/op
      Ops per second : 390905354 op/s
      -- Other function --
      Total time : 1086812529 ns
      Average runtime : 1.61948 ns/op
      Ops per second : 617483275 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.57962

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 451923431 ns
      Average runtime : 1.34684 ns/op
      Ops per second : 742480378 op/s
      -- Other function --
      Total time : 531260612 ns
      Average runtime : 1.58328 ns/op
      Ops per second : 631600145 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.850662

    • roundf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 413499755 ns 
           Average runtime : 5.10494 ns/op 
           Ops per second  : 195888870 op/s 
      -- Other function --
           Total time      : 92621102 ns 
           Average runtime : 1.14347 ns/op 
           Ops per second  : 874530730 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 4.46442 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 244894286 ns
      Average runtime : 1.94361 ns/op
      Ops per second : 514507717 op/s
      -- Other function --
      Total time : 170985347 ns
      Average runtime : 1.35703 ns/op
      Ops per second : 736905250 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.43225

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 102115907 ns
      Average runtime : 1.62089 ns/op
      Ops per second : 616945996 op/s
      -- Other function --
      Total time : 73625623 ns
      Average runtime : 1.16866 ns/op
      Ops per second : 855680365 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.38696

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 309419599 ns
      Average runtime : 7.56157 ns/op
      Ops per second : 132247602 op/s
      -- Other function --
      Total time : 54225272 ns
      Average runtime : 1.32515 ns/op
      Ops per second : 754629686 op/s
      -- Average runtime ratio --
      Mine / Other's : 5.70619

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 37964283 ns
      Average runtime : 1.85554 ns/op
      Ops per second : 538927602 op/s
      -- Other function --
      Total time : 27108952 ns
      Average runtime : 1.32497 ns/op
      Ops per second : 754732237 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.40043

    • roundevenf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 76503986 ns 
           Average runtime : 2.69503 ns/op 
           Ops per second  : 371053084 op/s 
      -- Other function --
           Total time      : 31063145 ns 
           Average runtime : 1.09427 ns/op 
           Ops per second  : 913849515 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.46285 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 394039940 ns
      Average runtime : 2.42366 ns/op
      Ops per second : 412598580 op/s
      -- Other function --
      Total time : 183213917 ns
      Average runtime : 1.12691 ns/op
      Ops per second : 887379750 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.15071

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 203525110 ns
      Average runtime : 1.48804 ns/op
      Ops per second : 672024793 op/s
      -- Other function --
      Total time : 156369227 ns
      Average runtime : 1.14327 ns/op
      Ops per second : 874685656 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.30157

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 2043352175 ns
      Average runtime : 3.04483 ns/op
      Ops per second : 328425304 op/s
      -- Other function --
      Total time : 708366210 ns
      Average runtime : 1.05555 ns/op
      Ops per second : 947375171 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.8846

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 727321058 ns
      Average runtime : 2.16759 ns/op
      Ops per second : 461342726 op/s
      -- Other function --
      Total time : 354177139 ns
      Average runtime : 1.05553 ns/op
      Ops per second : 947391130 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.05355

    • roundevenf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 433535124 ns 
           Average runtime : 5.35229 ns/op 
           Ops per second  : 186836072 op/s 
      -- Other function --
           Total time      : 92611836 ns 
           Average runtime : 1.14336 ns/op 
           Ops per second  : 874618229 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 4.68121 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 276561605 ns
      Average runtime : 2.19493 ns/op
      Ops per second : 455594694 op/s
      -- Other function --
      Total time : 171018677 ns
      Average runtime : 1.35729 ns/op
      Ops per second : 736761634 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.61714

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 109238229 ns
      Average runtime : 1.73394 ns/op
      Ops per second : 576721176 op/s
      -- Other function --
      Total time : 73614651 ns
      Average runtime : 1.16849 ns/op
      Ops per second : 855807901 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.48392

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 338207723 ns
      Average runtime : 8.2651 ns/op
      Ops per second : 120990732 op/s
      -- Other function --
      Total time : 54218110 ns
      Average runtime : 1.32498 ns/op
      Ops per second : 754729369 op/s
      -- Average runtime ratio --
      Mine / Other's : 6.23791

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 45638258 ns
      Average runtime : 2.23061 ns/op
      Ops per second : 448308083 op/s
      -- Other function --
      Total time : 27109077 ns
      Average runtime : 1.32498 ns/op
      Ops per second : 754728757 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.6835

  • Intel Core i7-13700H, GCC 14, -march=native
    • ceilf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 69774440 ns 
           Average runtime : 2.45797 ns/op 
           Ops per second  : 406840097 op/s 
      -- Other function --
           Total time      : 30980024 ns 
           Average runtime : 1.09134 ns/op 
           Ops per second  : 916301420 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.25224 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 268904430 ns
      Average runtime : 1.65398 ns/op
      Ops per second : 604602609 op/s
      -- Other function --
      Total time : 186371254 ns
      Average runtime : 1.14633 ns/op
      Ops per second : 872346547 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.44284

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 192708086 ns
      Average runtime : 1.40895 ns/op
      Ops per second : 709746657 op/s
      -- Other function --
      Total time : 152697853 ns
      Average runtime : 1.11643 ns/op
      Ops per second : 895716064 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.26202

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1396765547 ns
      Average runtime : 2.08134 ns/op
      Ops per second : 480458987 op/s
      -- Other function --
      Total time : 708368982 ns
      Average runtime : 1.05555 ns/op
      Ops per second : 947371464 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.97181

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 449041565 ns
      Average runtime : 1.33825 ns/op
      Ops per second : 747245480 op/s
      -- Other function --
      Total time : 354160746 ns
      Average runtime : 1.05548 ns/op
      Ops per second : 947434981 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.2679

    • ceilf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 399259882 ns 
           Average runtime : 4.92913 ns/op 
           Ops per second  : 202875379 op/s 
      -- Other function --
           Total time      : 92636679 ns 
           Average runtime : 1.14366 ns/op 
           Ops per second  : 874383676 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 4.30995 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 211942428 ns
      Average runtime : 1.68208 ns/op
      Ops per second : 594501068 op/s
      -- Other function --
      Total time : 171019129 ns
      Average runtime : 1.35729 ns/op
      Ops per second : 736759687 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.23929

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 103063918 ns
      Average runtime : 1.63594 ns/op
      Ops per second : 611271153 op/s
      -- Other function --
      Total time : 73632140 ns
      Average runtime : 1.16876 ns/op
      Ops per second : 855604631 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.39971

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 202448442 ns
      Average runtime : 4.94742 ns/op
      Ops per second : 202125536 op/s
      -- Other function --
      Total time : 54238882 ns
      Average runtime : 1.32549 ns/op
      Ops per second : 754440329 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.73253

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 32829212 ns
      Average runtime : 1.60456 ns/op
      Ops per second : 623225437 op/s
      -- Other function --
      Total time : 27202338 ns
      Average runtime : 1.32954 ns/op
      Ops per second : 752141231 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.20685

    • roundf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 72939786 ns 
           Average runtime : 2.56947 ns/op 
           Ops per second  : 389184580 op/s 
      -- Other function --
           Total time      : 55593796 ns 
           Average runtime : 1.95842 ns/op 
           Ops per second  : 510615249 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.31201 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 270362286 ns
      Average runtime : 1.66295 ns/op
      Ops per second : 601342452 op/s
      -- Other function --
      Total time : 265734503 ns
      Average runtime : 1.63448 ns/op
      Ops per second : 611814868 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.01742

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 191061468 ns
      Average runtime : 1.39691 ns/op
      Ops per second : 715863441 op/s
      -- Other function --
      Total time : 273614502 ns
      Average runtime : 2.00049 ns/op
      Ops per second : 499878182 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.698287

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1533700178 ns
      Average runtime : 2.28539 ns/op
      Ops per second : 437561767 op/s
      -- Other function --
      Total time : 1095378603 ns
      Average runtime : 1.63224 ns/op
      Ops per second : 612654435 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.40016

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 451946350 ns
      Average runtime : 1.34691 ns/op
      Ops per second : 742442725 op/s
      -- Other function --
      Total time : 531236602 ns
      Average runtime : 1.58321 ns/op
      Ops per second : 631628691 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.850744

    • roundf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 396555419 ns 
           Average runtime : 4.89575 ns/op 
           Ops per second  : 204258966 op/s 
      -- Other function --
           Total time      : 92658341 ns 
           Average runtime : 1.14393 ns/op 
           Ops per second  : 874179260 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 4.27976 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 269136129 ns
      Average runtime : 2.136 ns/op
      Ops per second : 468164569 op/s
      -- Other function --
      Total time : 170973624 ns
      Average runtime : 1.35693 ns/op
      Ops per second : 736955777 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.57414

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 102119948 ns
      Average runtime : 1.62095 ns/op
      Ops per second : 616921583 op/s
      -- Other function --
      Total time : 73614145 ns
      Average runtime : 1.16848 ns/op
      Ops per second : 855813784 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.38723

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 211990906 ns
      Average runtime : 5.18062 ns/op
      Ops per second : 193027148 op/s
      -- Other function --
      Total time : 54391024 ns
      Average runtime : 1.3292 ns/op
      Ops per second : 752330016 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.89753

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 40982714 ns
      Average runtime : 2.00307 ns/op
      Ops per second : 499234872 op/s
      -- Other function --
      Total time : 27165564 ns
      Average runtime : 1.32774 ns/op
      Ops per second : 753159404 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.50863

    • roundevenf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 70444843 ns 
           Average runtime : 2.48158 ns/op 
           Ops per second  : 402968319 op/s 
      -- Other function --
           Total time      : 31028846 ns 
           Average runtime : 1.09306 ns/op 
           Ops per second  : 914859676 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.2703 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 396241928 ns
      Average runtime : 2.43721 ns/op
      Ops per second : 410305695 op/s
      -- Other function --
      Total time : 182994993 ns
      Average runtime : 1.12557 ns/op
      Ops per second : 888441357 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.16532

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 225859002 ns
      Average runtime : 1.65133 ns/op
      Ops per second : 605572143 op/s
      -- Other function --
      Total time : 156358375 ns
      Average runtime : 1.14319 ns/op
      Ops per second : 874746363 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.4445

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1608208391 ns
      Average runtime : 2.39642 ns/op
      Ops per second : 417289552 op/s
      -- Other function --
      Total time : 708322939 ns
      Average runtime : 1.05548 ns/op
      Ops per second : 947433046 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.27045

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 759816146 ns
      Average runtime : 2.26443 ns/op
      Ops per second : 441612463 op/s
      -- Other function --
      Total time : 356115685 ns
      Average runtime : 1.06131 ns/op
      Ops per second : 942233926 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.13362

    • roundevenf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 399679925 ns 
           Average runtime : 4.93432 ns/op 
           Ops per second  : 202662167 op/s 
      -- Other function --
           Total time      : 92623252 ns 
           Average runtime : 1.1435 ns/op 
           Ops per second  : 874510430 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 4.31511 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 253570453 ns
      Average runtime : 2.01246 ns/op
      Ops per second : 496903320 op/s
      -- Other function --
      Total time : 170998677 ns
      Average runtime : 1.35713 ns/op
      Ops per second : 736847806 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.48288

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 106868868 ns
      Average runtime : 1.69633 ns/op
      Ops per second : 589507507 op/s
      -- Other function --
      Total time : 73613678 ns
      Average runtime : 1.16847 ns/op
      Ops per second : 855819213 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.45175

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 224933145 ns
      Average runtime : 5.4969 ns/op
      Ops per second : 181920721 op/s
      -- Other function --
      Total time : 54227806 ns
      Average runtime : 1.32522 ns/op
      Ops per second : 754594423 op/s
      -- Average runtime ratio --
      Mine / Other's : 4.14793

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 42401509 ns
      Average runtime : 2.07241 ns/op
      Ops per second : 482529996 op/s
      -- Other function --
      Total time : 27197467 ns
      Average runtime : 1.3293 ns/op
      Ops per second : 752275938 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.55902

  • Google Tensor G3, Clang 17
    • ceilf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 20761312 ns 
           Average runtime : 0.731366 ns/op 
           Ops per second  : 1367304725 op/s 
      -- Other function --
           Total time      : 20598185 ns 
           Average runtime : 0.725619 ns/op 
           Ops per second  : 1378133073 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.00792 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 118769938 ns
      Average runtime : 0.730531 ns/op
      Ops per second : 1368867600 op/s
      -- Other function --
      Total time : 118865682 ns
      Average runtime : 0.73112 ns/op
      Ops per second : 1367765003 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.999195

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 100606934 ns
      Average runtime : 0.735571 ns/op
      Ops per second : 1359488005 op/s
      -- Other function --
      Total time : 100431559 ns
      Average runtime : 0.734289 ns/op
      Ops per second : 1361861962 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.00175

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 461369344 ns
      Average runtime : 0.687494 ns/op
      Ops per second : 1454558194 op/s
      -- Other function --
      Total time : 460846111 ns
      Average runtime : 0.686714 ns/op
      Ops per second : 1456209663 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.00114

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 230412923 ns
      Average runtime : 0.686684 ns/op
      Ops per second : 1456273700 op/s
      -- Other function --
      Total time : 230302816 ns
      Average runtime : 0.686356 ns/op
      Ops per second : 1456969939 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.00048

    • ceilf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 148635173 ns 
           Average runtime : 1.835 ns/op 
           Ops per second  : 544958493 op/s 
      -- Other function --
           Total time      : 61964884 ns 
           Average runtime : 0.764999 ns/op 
           Ops per second  : 1307191989 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.3987 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 182265218 ns
      Average runtime : 1.44655 ns/op
      Ops per second : 691300300 op/s
      -- Other function --
      Total time : 93509806 ns
      Average runtime : 0.742141 ns/op
      Ops per second : 1347452266 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.94916

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 68846476 ns
      Average runtime : 1.0928 ns/op
      Ops per second : 915079516 op/s
      -- Other function --
      Total time : 49488078 ns
      Average runtime : 0.785525 ns/op
      Ops per second : 1273033881 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.39117

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 85558919 ns
      Average runtime : 2.09088 ns/op
      Ops per second : 478266912 op/s
      -- Other function --
      Total time : 28625936 ns
      Average runtime : 0.699559 ns/op
      Ops per second : 1429472908 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.98886

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 28222819 ns
      Average runtime : 1.37941 ns/op
      Ops per second : 724945300 op/s
      -- Other function --
      Total time : 14267049 ns
      Average runtime : 0.697314 ns/op
      Ops per second : 1434073717 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.97818

    • roundf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 30577799 ns 
           Average runtime : 1.07717 ns/op 
           Ops per second  : 928354588 op/s 
      -- Other function --
           Total time      : 20682943 ns 
           Average runtime : 0.728605 ns/op 
           Ops per second  : 1372485530 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.47841 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 172827922 ns
      Average runtime : 1.06303 ns/op
      Ops per second : 940706328 op/s
      -- Other function --
      Total time : 118377075 ns
      Average runtime : 0.728114 ns/op
      Ops per second : 1373410518 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.45998

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 146390381 ns
      Average runtime : 1.07031 ns/op
      Ops per second : 934309474 op/s
      -- Other function --
      Total time : 100645590 ns
      Average runtime : 0.735854 ns/op
      Ops per second : 1358965852 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.45451

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 690906291 ns
      Average runtime : 1.02953 ns/op
      Ops per second : 971316325 op/s
      -- Other function --
      Total time : 460659954 ns
      Average runtime : 0.686437 ns/op
      Ops per second : 1456798130 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.49982

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 345550538 ns
      Average runtime : 1.02982 ns/op
      Ops per second : 971042562 op/s
      -- Other function --
      Total time : 230436727 ns
      Average runtime : 0.686755 ns/op
      Ops per second : 1456123268 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.49955

    • roundf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 176421143 ns 
           Average runtime : 2.17804 ns/op 
           Ops per second  : 459128643 op/s 
      -- Other function --
           Total time      : 62396566 ns 
           Average runtime : 0.770328 ns/op 
           Ops per second  : 1298148362 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.82742 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 223069865 ns
      Average runtime : 1.7704 ns/op
      Ops per second : 564845457 op/s
      -- Other function --
      Total time : 93863525 ns
      Average runtime : 0.744949 ns/op
      Ops per second : 1342374474 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.37653

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 71912679 ns
      Average runtime : 1.14147 ns/op
      Ops per second : 876062481 op/s
      -- Other function --
      Total time : 49532063 ns
      Average runtime : 0.786223 ns/op
      Ops per second : 1271903413 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.45184

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 100788168 ns
      Average runtime : 2.46305 ns/op
      Ops per second : 406000037 op/s
      -- Other function --
      Total time : 29107829 ns
      Average runtime : 0.711335 ns/op
      Ops per second : 1405807351 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.46258

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 35284668 ns
      Average runtime : 1.72457 ns/op
      Ops per second : 579855250 op/s
      -- Other function --
      Total time : 14239746 ns
      Average runtime : 0.69598 ns/op
      Ops per second : 1436823381 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.4779

    • roundevenf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 44285848 ns 
           Average runtime : 1.56007 ns/op 
           Ops per second  : 640995742 op/s 
      -- Other function --
           Total time      : 30807902 ns 
           Average runtime : 1.08528 ns/op 
           Ops per second  : 921420744 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.43748 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 235301799 ns
      Average runtime : 1.4473 ns/op
      Ops per second : 690943803 op/s
      -- Other function --
      Total time : 174150838 ns
      Average runtime : 1.07117 ns/op
      Ops per second : 933560365 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.35114

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 152473918 ns
      Average runtime : 1.11479 ns/op
      Ops per second : 897031582 op/s
      -- Other function --
      Total time : 146988769 ns
      Average runtime : 1.07468 ns/op
      Ops per second : 930505921 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.03732

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1481693442 ns
      Average runtime : 2.2079 ns/op
      Ops per second : 452919977 op/s
      -- Other function --
      Total time : 691564494 ns
      Average runtime : 1.03051 ns/op
      Ops per second : 970391866 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.14252

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 472099202 ns
      Average runtime : 1.40697 ns/op
      Ops per second : 710749517 op/s
      -- Other function --
      Total time : 345915487 ns
      Average runtime : 1.03091 ns/op
      Ops per second : 970018089 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.36478

    • roundevenf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 150420939 ns 
           Average runtime : 1.85705 ns/op 
           Ops per second  : 538488860 op/s 
      -- Other function --
           Total time      : 62955404 ns 
           Average runtime : 0.777227 ns/op 
           Ops per second  : 1286625052 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.38933 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 185886149 ns
      Average runtime : 1.47529 ns/op
      Ops per second : 677834258 op/s
      -- Other function --
      Total time : 94339396 ns
      Average runtime : 0.748725 ns/op
      Ops per second : 1335603208 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.9704

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 74434978 ns
      Average runtime : 1.18151 ns/op
      Ops per second : 846376282 op/s
      -- Other function --
      Total time : 49554932 ns
      Average runtime : 0.786586 ns/op
      Ops per second : 1271316445 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.50207

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 101967855 ns
      Average runtime : 2.49188 ns/op
      Ops per second : 401302940 op/s
      -- Other function --
      Total time : 29097738 ns
      Average runtime : 0.711088 ns/op
      Ops per second : 1406294881 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.50432

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 30650065 ns
      Average runtime : 1.49805 ns/op
      Ops per second : 667535289 op/s
      -- Other function --
      Total time : 14244792 ns
      Average runtime : 0.696226 ns/op
      Ops per second : 1436314408 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.15167

After:
  • Intel Core i7-13700H, Clang 18
    • ceilf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 78571381 ns 
           Average runtime : 2.76786 ns/op 
           Ops per second  : 361289818 op/s 
      -- Other function --
           Total time      : 38129528 ns 
           Average runtime : 1.3432 ns/op 
           Ops per second  : 744489677 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.06064 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 266036570 ns
      Average runtime : 1.63634 ns/op
      Ops per second : 611120192 op/s
      -- Other function --
      Total time : 223657074 ns
      Average runtime : 1.37567 ns/op
      Ops per second : 726917852 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.18948

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 191935789 ns
      Average runtime : 1.40331 ns/op
      Ops per second : 712602483 op/s
      -- Other function --
      Total time : 189649851 ns
      Average runtime : 1.38659 ns/op
      Ops per second : 721191813 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.01205

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1422720381 ns
      Average runtime : 2.12002 ns/op
      Ops per second : 471693924 op/s
      -- Other function --
      Total time : 886143590 ns
      Average runtime : 1.32046 ns/op
      Ops per second : 757313563 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.60552

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 532733251 ns
      Average runtime : 1.58767 ns/op
      Ops per second : 629854208 op/s
      -- Other function --
      Total time : 442513024 ns
      Average runtime : 1.31879 ns/op
      Ops per second : 758269840 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.20388

    • ceilf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 165658318 ns 
           Average runtime : 2.04516 ns/op 
           Ops per second  : 488958242 op/s 
      -- Other function --
           Total time      : 92627789 ns 
           Average runtime : 1.14355 ns/op 
           Ops per second  : 874467596 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.78843 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 481695417 ns
      Average runtime : 3.82298 ns/op
      Ops per second : 261576082 op/s
      -- Other function --
      Total time : 140144097 ns
      Average runtime : 1.11225 ns/op
      Ops per second : 899074614 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.43714

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 92632762 ns
      Average runtime : 1.47036 ns/op
      Ops per second : 680104950 op/s
      -- Other function --
      Total time : 73617068 ns
      Average runtime : 1.16852 ns/op
      Ops per second : 855779803 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.25831

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 562782869 ns
      Average runtime : 13.7532 ns/op
      Ops per second : 72710102 op/s
      -- Other function --
      Total time : 43564033 ns
      Average runtime : 1.06461 ns/op
      Ops per second : 939306973 op/s
      -- Average runtime ratio --
      Mine / Other's : 12.9185

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 78301265 ns
      Average runtime : 3.82704 ns/op
      Ops per second : 261298460 op/s
      -- Other function --
      Total time : 21789420 ns
      Average runtime : 1.06498 ns/op
      Ops per second : 938987820 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.59355

    • roundf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 67540602 ns 
           Average runtime : 2.37928 ns/op 
           Ops per second  : 420295928 op/s 
      -- Other function --
           Total time      : 59943494 ns 
           Average runtime : 2.11165 ns/op 
           Ops per second  : 473563319 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.12674 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 235124844 ns
      Average runtime : 1.44621 ns/op
      Ops per second : 691463808 op/s
      -- Other function --
      Total time : 307233291 ns
      Average runtime : 1.88973 ns/op
      Ops per second : 529175466 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.765297

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 190301307 ns
      Average runtime : 1.39136 ns/op
      Ops per second : 718722967 op/s
      -- Other function --
      Total time : 299811651 ns
      Average runtime : 2.19202 ns/op
      Ops per second : 456199482 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.634736

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1671680786 ns
      Average runtime : 2.491 ns/op
      Ops per second : 401445398 op/s
      -- Other function --
      Total time : 1087919244 ns
      Average runtime : 1.62113 ns/op
      Ops per second : 616855123 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.53659

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 450608728 ns
      Average runtime : 1.34292 ns/op
      Ops per second : 744646650 op/s
      -- Other function --
      Total time : 613525352 ns
      Average runtime : 1.82845 ns/op
      Ops per second : 546911841 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.734458

    • roundf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 233572560 ns 
           Average runtime : 2.88361 ns/op 
           Ops per second  : 346787310 op/s 
      -- Other function --
           Total time      : 92615591 ns 
           Average runtime : 1.1434 ns/op 
           Ops per second  : 874582768 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.52196 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 311244364 ns
      Average runtime : 2.47019 ns/op
      Ops per second : 404826607 op/s
      -- Other function --
      Total time : 140158833 ns
      Average runtime : 1.11237 ns/op
      Ops per second : 898980087 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.22065

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 92630743 ns
      Average runtime : 1.47033 ns/op
      Ops per second : 680119774 op/s
      -- Other function --
      Total time : 73615051 ns
      Average runtime : 1.16849 ns/op
      Ops per second : 855803251 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.25831

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 391587815 ns
      Average runtime : 9.56959 ns/op
      Ops per second : 104497633 op/s
      -- Other function --
      Total time : 43573953 ns
      Average runtime : 1.06486 ns/op
      Ops per second : 939093132 op/s
      -- Average runtime ratio --
      Mine / Other's : 8.98674

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 49800043 ns
      Average runtime : 2.43402 ns/op
      Ops per second : 410843018 op/s
      -- Other function --
      Total time : 21800888 ns
      Average runtime : 1.06554 ns/op
      Ops per second : 938493881 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.28431

    • roundevenf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 65096482 ns 
           Average runtime : 2.29318 ns/op 
           Ops per second  : 436076407 op/s 
      -- Other function --
           Total time      : 38128721 ns 
           Average runtime : 1.34317 ns/op 
           Ops per second  : 744505434 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.70728 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 307344698 ns
      Average runtime : 1.89042 ns/op
      Ops per second : 528983649 op/s
      -- Other function --
      Total time : 229073204 ns
      Average runtime : 1.40898 ns/op
      Ops per second : 709730850 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.34169

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 190593877 ns
      Average runtime : 1.3935 ns/op
      Ops per second : 717619695 op/s
      -- Other function --
      Total time : 193650418 ns
      Average runtime : 1.41584 ns/op
      Ops per second : 706292924 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.984216

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1829018577 ns
      Average runtime : 2.72545 ns/op
      Ops per second : 366911833 op/s
      -- Other function --
      Total time : 913756290 ns
      Average runtime : 1.3616 ns/op
      Ops per second : 734428388 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.00165

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 608920602 ns
      Average runtime : 1.81473 ns/op
      Ops per second : 551047671 op/s
      -- Other function --
      Total time : 461948348 ns
      Average runtime : 1.37671 ns/op
      Ops per second : 726367528 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.31816

    • roundevenf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 187582208 ns 
           Average runtime : 2.31583 ns/op 
           Ops per second  : 431810675 op/s 
      -- Other function --
           Total time      : 92610904 ns 
           Average runtime : 1.14334 ns/op 
           Ops per second  : 874627030 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.02549 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 443690358 ns
      Average runtime : 3.52135 ns/op
      Ops per second : 283981830 op/s
      -- Other function --
      Total time : 140113685 ns
      Average runtime : 1.11201 ns/op
      Ops per second : 899269760 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.16665

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 92611834 ns
      Average runtime : 1.47003 ns/op
      Ops per second : 680258637 op/s
      -- Other function --
      Total time : 73624928 ns
      Average runtime : 1.16865 ns/op
      Ops per second : 855688442 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.25789

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 374983209 ns
      Average runtime : 9.16381 ns/op
      Ops per second : 109124886 op/s
      -- Other function --
      Total time : 43564526 ns
      Average runtime : 1.06463 ns/op
      Ops per second : 939296344 op/s
      -- Average runtime ratio --
      Mine / Other's : 8.60754

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 71755637 ns
      Average runtime : 3.50712 ns/op
      Ops per second : 285134392 op/s
      -- Other function --
      Total time : 21782415 ns
      Average runtime : 1.06463 ns/op
      Ops per second : 939289789 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.2942

  • Intel Core i7-13700H, Clang 18, -march=native
    • ceilf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 53350832 ns 
           Average runtime : 1.87941 ns/op 
           Ops per second  : 532082423 op/s 
      -- Other function --
           Total time      : 30646169 ns 
           Average runtime : 1.07958 ns/op 
           Ops per second  : 926283477 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.74086 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 232077970 ns
      Average runtime : 1.42747 ns/op
      Ops per second : 700541804 op/s
      -- Other function --
      Total time : 180460993 ns
      Average runtime : 1.10998 ns/op
      Ops per second : 900916687 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.28603

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 160973896 ns
      Average runtime : 1.17693 ns/op
      Ops per second : 849665215 op/s
      -- Other function --
      Total time : 151502124 ns
      Average runtime : 1.10768 ns/op
      Ops per second : 902785494 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.06252

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1173212415 ns
      Average runtime : 1.74822 ns/op
      Ops per second : 572009425 op/s
      -- Other function --
      Total time : 708337948 ns
      Average runtime : 1.05551 ns/op
      Ops per second : 947412971 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.65629

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 452503246 ns
      Average runtime : 1.34856 ns/op
      Ops per second : 741529001 op/s
      -- Other function --
      Total time : 354168366 ns
      Average runtime : 1.0555 ns/op
      Ops per second : 947414597 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.27765

    • ceilf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 176993294 ns 
           Average runtime : 2.1851 ns/op 
           Ops per second  : 457644457 op/s 
      -- Other function --
           Total time      : 92909915 ns 
           Average runtime : 1.14704 ns/op 
           Ops per second  : 871812228 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.905 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 479880994 ns
      Average runtime : 3.80858 ns/op
      Ops per second : 262565097 op/s
      -- Other function --
      Total time : 141358088 ns
      Average runtime : 1.12189 ns/op
      Ops per second : 891353312 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.39479

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 97382980 ns
      Average runtime : 1.54576 ns/op
      Ops per second : 646930295 op/s
      -- Other function --
      Total time : 73854753 ns
      Average runtime : 1.1723 ns/op
      Ops per second : 853025667 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.31857

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 374760373 ns
      Average runtime : 9.15837 ns/op
      Ops per second : 109189772 op/s
      -- Other function --
      Total time : 43612486 ns
      Average runtime : 1.0658 ns/op
      Ops per second : 938263413 op/s
      -- Average runtime ratio --
      Mine / Other's : 8.59296

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 79335309 ns
      Average runtime : 3.87758 ns/op
      Ops per second : 257892737 op/s
      -- Other function --
      Total time : 21787595 ns
      Average runtime : 1.06489 ns/op
      Ops per second : 939066473 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.64131

    • roundf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 60957441 ns 
           Average runtime : 2.14737 ns/op 
           Ops per second  : 465686215 op/s 
      -- Other function --
           Total time      : 56892253 ns 
           Average runtime : 2.00416 ns/op 
           Ops per second  : 498961431 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.07145 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 271502856 ns
      Average runtime : 1.66996 ns/op
      Ops per second : 598816242 op/s
      -- Other function --
      Total time : 265844550 ns
      Average runtime : 1.63516 ns/op
      Ops per second : 611561606 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.02128

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 231052497 ns
      Average runtime : 1.6893 ns/op
      Ops per second : 591960363 op/s
      -- Other function --
      Total time : 293389326 ns
      Average runtime : 2.14507 ns/op
      Ops per second : 466185739 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.787529

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1515553786 ns
      Average runtime : 2.25835 ns/op
      Ops per second : 442800886 op/s
      -- Other function --
      Total time : 1086735578 ns
      Average runtime : 1.61936 ns/op
      Ops per second : 617526998 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.39459

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 531233886 ns
      Average runtime : 1.5832 ns/op
      Ops per second : 631631921 op/s
      -- Other function --
      Total time : 531210123 ns
      Average runtime : 1.58313 ns/op
      Ops per second : 631660176 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.00004

    • roundf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 202064141 ns 
           Average runtime : 2.49462 ns/op 
           Ops per second  : 400862813 op/s 
      -- Other function --
           Total time      : 92890302 ns 
           Average runtime : 1.14679 ns/op 
           Ops per second  : 871996303 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.1753 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 312067943 ns
      Average runtime : 2.47673 ns/op
      Ops per second : 403758229 op/s
      -- Other function --
      Total time : 140528614 ns
      Average runtime : 1.11531 ns/op
      Ops per second : 896614549 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.22067

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 125855505 ns
      Average runtime : 1.99771 ns/op
      Ops per second : 500574051 op/s
      -- Other function --
      Total time : 73820328 ns
      Average runtime : 1.17175 ns/op
      Ops per second : 853423463 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.70489

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 313802342 ns
      Average runtime : 7.66868 ns/op
      Ops per second : 130400556 op/s
      -- Other function --
      Total time : 43567199 ns
      Average runtime : 1.06469 ns/op
      Ops per second : 939238714 op/s
      -- Average runtime ratio --
      Mine / Other's : 7.20272

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 49715191 ns
      Average runtime : 2.42987 ns/op
      Ops per second : 411544230 op/s
      -- Other function --
      Total time : 21795564 ns
      Average runtime : 1.06528 ns/op
      Ops per second : 938723127 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.28098

    • roundevenf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 61395386 ns 
           Average runtime : 2.1628 ns/op 
           Ops per second  : 462364386 op/s 
      -- Other function --
           Total time      : 30719575 ns 
           Average runtime : 1.08217 ns/op 
           Ops per second  : 924070075 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.99858 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 297810620 ns
      Average runtime : 1.83178 ns/op
      Ops per second : 545918476 op/s
      -- Other function --
      Total time : 183033140 ns
      Average runtime : 1.1258 ns/op
      Ops per second : 888256192 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.62709

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 161084585 ns
      Average runtime : 1.17774 ns/op
      Ops per second : 849081369 op/s
      -- Other function --
      Total time : 155900422 ns
      Average runtime : 1.13984 ns/op
      Ops per second : 877315906 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.03325

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1468646777 ns
      Average runtime : 2.18845 ns/op
      Ops per second : 456943473 op/s
      -- Other function --
      Total time : 712329217 ns
      Average runtime : 1.06145 ns/op
      Ops per second : 942104498 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.06175

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 619817984 ns
      Average runtime : 1.8472 ns/op
      Ops per second : 541359380 op/s
      -- Other function --
      Total time : 356180360 ns
      Average runtime : 1.0615 ns/op
      Ops per second : 942062835 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.74018

    • roundevenf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 177757939 ns 
           Average runtime : 2.19454 ns/op 
           Ops per second  : 455675850 op/s 
      -- Other function --
           Total time      : 92889441 ns 
           Average runtime : 1.14678 ns/op 
           Ops per second  : 872004386 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.91365 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 446068967 ns
      Average runtime : 3.54023 ns/op
      Ops per second : 282467531 op/s
      -- Other function --
      Total time : 140550514 ns
      Average runtime : 1.11548 ns/op
      Ops per second : 896474843 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.17373

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 111658887 ns
      Average runtime : 1.77236 ns/op
      Ops per second : 564218412 op/s
      -- Other function --
      Total time : 73800152 ns
      Average runtime : 1.17143 ns/op
      Ops per second : 853656778 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.51299

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 301251322 ns
      Average runtime : 7.36196 ns/op
      Ops per second : 135833428 op/s
      -- Other function --
      Total time : 43564100 ns
      Average runtime : 1.06462 ns/op
      Ops per second : 939305529 op/s
      -- Average runtime ratio --
      Mine / Other's : 6.91513

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 72140076 ns
      Average runtime : 3.52591 ns/op
      Ops per second : 283614893 op/s
      -- Other function --
      Total time : 21785294 ns
      Average runtime : 1.06477 ns/op
      Ops per second : 939165659 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.31141

  • Intel Core i7-13700H, GCC 14
    • ceilf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 71494030 ns 
           Average runtime : 2.51854 ns/op 
           Ops per second  : 397054691 op/s 
      -- Other function --
           Total time      : 30980516 ns 
           Average runtime : 1.09136 ns/op 
           Ops per second  : 916286868 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.30771 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 265693977 ns
      Average runtime : 1.63423 ns/op
      Ops per second : 611908187 op/s
      -- Other function --
      Total time : 178684097 ns
      Average runtime : 1.09905 ns/op
      Ops per second : 909875712 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.48695

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 192499563 ns
      Average runtime : 1.40743 ns/op
      Ops per second : 710515483 op/s
      -- Other function --
      Total time : 152617516 ns
      Average runtime : 1.11584 ns/op
      Ops per second : 896187564 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.26132

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1351995431 ns
      Average runtime : 2.01463 ns/op
      Ops per second : 496368955 op/s
      -- Other function --
      Total time : 709224891 ns
      Average runtime : 1.05683 ns/op
      Ops per second : 946228154 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.9063

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 449017317 ns
      Average runtime : 1.33818 ns/op
      Ops per second : 747285833 op/s
      -- Other function --
      Total time : 354203535 ns
      Average runtime : 1.05561 ns/op
      Ops per second : 947320528 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.26768

    • ceilf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 175613615 ns 
           Average runtime : 2.16807 ns/op 
           Ops per second  : 461239864 op/s 
      -- Other function --
           Total time      : 92615791 ns 
           Average runtime : 1.1434 ns/op 
           Ops per second  : 874580880 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.89615 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 215064573 ns
      Average runtime : 1.70686 ns/op
      Ops per second : 585870551 op/s
      -- Other function --
      Total time : 170989078 ns
      Average runtime : 1.35706 ns/op
      Ops per second : 736889171 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.25777

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 106460338 ns
      Average runtime : 1.68985 ns/op
      Ops per second : 591769678 op/s
      -- Other function --
      Total time : 73614167 ns
      Average runtime : 1.16848 ns/op
      Ops per second : 855813528 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.44619

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 270284175 ns
      Average runtime : 6.60519 ns/op
      Ops per second : 151396211 op/s
      -- Other function --
      Total time : 54222735 ns
      Average runtime : 1.32509 ns/op
      Ops per second : 754664994 op/s
      -- Average runtime ratio --
      Mine / Other's : 4.9847

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 33804592 ns
      Average runtime : 1.65223 ns/op
      Ops per second : 605243216 op/s
      -- Other function --
      Total time : 27141638 ns
      Average runtime : 1.32657 ns/op
      Ops per second : 753823332 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.24549

    • roundf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 76497599 ns 
           Average runtime : 2.69481 ns/op 
           Ops per second  : 371084065 op/s 
      -- Other function --
           Total time      : 56918782 ns 
           Average runtime : 2.0051 ns/op 
           Ops per second  : 498728873 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.34398 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 269414655 ns
      Average runtime : 1.65712 ns/op
      Ops per second : 603457595 op/s
      -- Other function --
      Total time : 265779164 ns
      Average runtime : 1.63476 ns/op
      Ops per second : 611712060 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.01368

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 191235173 ns
      Average runtime : 1.39818 ns/op
      Ops per second : 715213199 op/s
      -- Other function --
      Total time : 284830736 ns
      Average runtime : 2.08249 ns/op
      Ops per second : 480193682 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.671399

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1676848837 ns
      Average runtime : 2.4987 ns/op
      Ops per second : 400208143 op/s
      -- Other function --
      Total time : 1086663340 ns
      Average runtime : 1.61925 ns/op
      Ops per second : 617568050 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.54312

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 454990655 ns
      Average runtime : 1.35598 ns/op
      Ops per second : 737475102 op/s
      -- Other function --
      Total time : 531259675 ns
      Average runtime : 1.58328 ns/op
      Ops per second : 631601259 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.856437

    • roundf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 199531194 ns 
           Average runtime : 2.46335 ns/op 
           Ops per second  : 405951562 op/s 
      -- Other function --
           Total time      : 92632308 ns 
           Average runtime : 1.14361 ns/op 
           Ops per second  : 874424936 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.15401 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 185614057 ns
      Average runtime : 1.47313 ns/op
      Ops per second : 678827897 op/s
      -- Other function --
      Total time : 170991106 ns
      Average runtime : 1.35707 ns/op
      Ops per second : 736880431 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.08552

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 92612765 ns
      Average runtime : 1.47004 ns/op
      Ops per second : 680251798 op/s
      -- Other function --
      Total time : 73615451 ns
      Average runtime : 1.1685 ns/op
      Ops per second : 855798601 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.25806

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 202309657 ns
      Average runtime : 4.94403 ns/op
      Ops per second : 202264195 op/s
      -- Other function --
      Total time : 54442085 ns
      Average runtime : 1.33045 ns/op
      Ops per second : 751624409 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.71605

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 30142107 ns
      Average runtime : 1.47322 ns/op
      Ops per second : 678784664 op/s
      -- Other function --
      Total time : 27199770 ns
      Average runtime : 1.32941 ns/op
      Ops per second : 752212242 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.10818

    • roundevenf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 67652298 ns 
           Average runtime : 2.38321 ns/op 
           Ops per second  : 419602006 op/s 
      -- Other function --
           Total time      : 31164790 ns 
           Average runtime : 1.09785 ns/op 
           Ops per second  : 910868964 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.17079 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 363817850 ns
      Average runtime : 2.23777 ns/op
      Ops per second : 446872851 op/s
      -- Other function --
      Total time : 183498292 ns
      Average runtime : 1.12866 ns/op
      Ops per second : 886004541 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.98268

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 191053698 ns
      Average runtime : 1.39686 ns/op
      Ops per second : 715892554 op/s
      -- Other function --
      Total time : 156307884 ns
      Average runtime : 1.14282 ns/op
      Ops per second : 875028926 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.22229

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1807076780 ns
      Average runtime : 2.69275 ns/op
      Ops per second : 371366932 op/s
      -- Other function --
      Total time : 711303623 ns
      Average runtime : 1.05993 ns/op
      Ops per second : 943462873 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.54051

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 667598498 ns
      Average runtime : 1.9896 ns/op
      Ops per second : 502613892 op/s
      -- Other function --
      Total time : 355646966 ns
      Average runtime : 1.05991 ns/op
      Ops per second : 943475727 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.87714

    • roundevenf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 164858865 ns 
           Average runtime : 2.03529 ns/op 
           Ops per second  : 491329356 op/s 
      -- Other function --
           Total time      : 92612022 ns 
           Average runtime : 1.14336 ns/op 
           Ops per second  : 874616472 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.7801 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 243572710 ns
      Average runtime : 1.93312 ns/op
      Ops per second : 517299331 op/s
      -- Other function --
      Total time : 171013598 ns
      Average runtime : 1.35725 ns/op
      Ops per second : 736783515 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.42429

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 93458620 ns
      Average runtime : 1.48347 ns/op
      Ops per second : 674095123 op/s
      -- Other function --
      Total time : 73622062 ns
      Average runtime : 1.1686 ns/op
      Ops per second : 855721753 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.26944

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 210626089 ns
      Average runtime : 5.14727 ns/op
      Ops per second : 194277927 op/s
      -- Other function --
      Total time : 54395162 ns
      Average runtime : 1.32931 ns/op
      Ops per second : 752272784 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.87215

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 38035462 ns
      Average runtime : 1.85902 ns/op
      Ops per second : 537919060 op/s
      -- Other function --
      Total time : 27208292 ns
      Average runtime : 1.32983 ns/op
      Ops per second : 751976640 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.39794

  • Intel Core i7-13700H, GCC 14, -march=native
    • ceilf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 63426498 ns 
           Average runtime : 2.23435 ns/op 
           Ops per second  : 447558053 op/s 
      -- Other function --
           Total time      : 30639955 ns 
           Average runtime : 1.07936 ns/op 
           Ops per second  : 926471334 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.07006 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 269280947 ns
      Average runtime : 1.65629 ns/op
      Ops per second : 603757235 op/s
      -- Other function --
      Total time : 180188636 ns
      Average runtime : 1.10831 ns/op
      Ops per second : 902278432 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.49444

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 223665221 ns
      Average runtime : 1.63529 ns/op
      Ops per second : 611511791 op/s
      -- Other function --
      Total time : 189151485 ns
      Average runtime : 1.38295 ns/op
      Ops per second : 723091970 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.18247

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1173057935 ns
      Average runtime : 1.74799 ns/op
      Ops per second : 572084753 op/s
      -- Other function --
      Total time : 708347011 ns
      Average runtime : 1.05552 ns/op
      Ops per second : 947400849 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.65605

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 524693329 ns
      Average runtime : 1.56371 ns/op
      Ops per second : 639505519 op/s
      -- Other function --
      Total time : 354139037 ns
      Average runtime : 1.05542 ns/op
      Ops per second : 947493060 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.4816

    • ceilf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 153973964 ns 
           Average runtime : 1.90091 ns/op 
           Ops per second  : 526062964 op/s 
      -- Other function --
           Total time      : 111691432 ns 
           Average runtime : 1.37891 ns/op 
           Ops per second  : 725212297 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.37857 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 187635417 ns
      Average runtime : 1.48917 ns/op
      Ops per second : 671515015 op/s
      -- Other function --
      Total time : 140134410 ns
      Average runtime : 1.11218 ns/op
      Ops per second : 899136764 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.33897

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 111629161 ns
      Average runtime : 1.77189 ns/op
      Ops per second : 564368659 op/s
      -- Other function --
      Total time : 71257173 ns
      Average runtime : 1.13107 ns/op
      Ops per second : 884121518 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.56657

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 84847801 ns
      Average runtime : 2.0735 ns/op
      Ops per second : 482275315 op/s
      -- Other function --
      Total time : 43423625 ns
      Average runtime : 1.06118 ns/op
      Ops per second : 942344173 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.95395

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 32550420 ns
      Average runtime : 1.59093 ns/op
      Ops per second : 628563318 op/s
      -- Other function --
      Total time : 22538792 ns
      Average runtime : 1.1016 ns/op
      Ops per second : 907768260 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.4442

    • roundf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 69268206 ns 
           Average runtime : 2.44013 ns/op 
           Ops per second  : 409813414 op/s 
      -- Other function --
           Total time      : 54052421 ns 
           Average runtime : 1.90412 ns/op 
           Ops per second  : 525176106 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.2815 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 268281431 ns
      Average runtime : 1.65015 ns/op
      Ops per second : 606006608 op/s
      -- Other function --
      Total time : 265620204 ns
      Average runtime : 1.63378 ns/op
      Ops per second : 612078138 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.01002

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 231049603 ns
      Average runtime : 1.68928 ns/op
      Ops per second : 591967777 op/s
      -- Other function --
      Total time : 300559749 ns
      Average runtime : 2.19749 ns/op
      Ops per second : 455063994 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.768731

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1409764231 ns
      Average runtime : 2.10071 ns/op
      Ops per second : 476028931 op/s
      -- Other function --
      Total time : 1088017715 ns
      Average runtime : 1.62127 ns/op
      Ops per second : 616799295 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.29572

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 525332340 ns
      Average runtime : 1.56561 ns/op
      Ops per second : 638727629 op/s
      -- Other function --
      Total time : 531271134 ns
      Average runtime : 1.58331 ns/op
      Ops per second : 631587636 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.988822

    • roundf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 165336580 ns 
           Average runtime : 2.04119 ns/op 
           Ops per second  : 489909734 op/s 
      -- Other function --
           Total time      : 111609696 ns 
           Average runtime : 1.3779 ns/op 
           Ops per second  : 725743397 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.48138 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 185608964 ns
      Average runtime : 1.47309 ns/op
      Ops per second : 678846523 op/s
      -- Other function --
      Total time : 140105893 ns
      Average runtime : 1.11195 ns/op
      Ops per second : 899319773 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.32478

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 111623407 ns
      Average runtime : 1.7718 ns/op
      Ops per second : 564397752 op/s
      -- Other function --
      Total time : 71248909 ns
      Average runtime : 1.13094 ns/op
      Ops per second : 884224065 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.56667

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 102878761 ns
      Average runtime : 2.51414 ns/op
      Ops per second : 397749735 op/s
      -- Other function --
      Total time : 43414515 ns
      Average runtime : 1.06096 ns/op
      Ops per second : 942541912 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.36969

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 32957094 ns
      Average runtime : 1.61081 ns/op
      Ops per second : 620807162 op/s
      -- Other function --
      Total time : 24196557 ns
      Average runtime : 1.18263 ns/op
      Ops per second : 845574847 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.36206

    • roundevenf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 60141096 ns 
           Average runtime : 2.11861 ns/op 
           Ops per second  : 472007360 op/s 
      -- Other function --
           Total time      : 30641272 ns 
           Average runtime : 1.07941 ns/op 
           Ops per second  : 926431513 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.96275 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 313425966 ns
      Average runtime : 1.92782 ns/op
      Ops per second : 518720009 op/s
      -- Other function --
      Total time : 183756706 ns
      Average runtime : 1.13025 ns/op
      Ops per second : 884758567 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.70566

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 190876854 ns
      Average runtime : 1.39556 ns/op
      Ops per second : 716555816 op/s
      -- Other function --
      Total time : 192165276 ns
      Average runtime : 1.40498 ns/op
      Ops per second : 711751482 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.993295

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1359473204 ns
      Average runtime : 2.02577 ns/op
      Ops per second : 493638681 op/s
      -- Other function --
      Total time : 711552350 ns
      Average runtime : 1.0603 ns/op
      Ops per second : 943133080 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.91057

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 672909506 ns
      Average runtime : 2.00543 ns/op
      Ops per second : 498646960 op/s
      -- Other function --
      Total time : 354863147 ns
      Average runtime : 1.05757 ns/op
      Ops per second : 945559669 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.89625

    • roundevenf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 153721489 ns 
           Average runtime : 1.8978 ns/op 
           Ops per second  : 526926980 op/s 
      -- Other function --
           Total time      : 111749893 ns 
           Average runtime : 1.37963 ns/op 
           Ops per second  : 724832908 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.37559 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 271204010 ns
      Average runtime : 2.15241 ns/op
      Ops per second : 464594900 op/s
      -- Other function --
      Total time : 140106572 ns
      Average runtime : 1.11196 ns/op
      Ops per second : 899315415 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.9357

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 113985994 ns
      Average runtime : 1.8093 ns/op
      Ops per second : 552699483 op/s
      -- Other function --
      Total time : 71251956 ns
      Average runtime : 1.13098 ns/op
      Ops per second : 884186253 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.59976

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 104685500 ns
      Average runtime : 2.5583 ns/op
      Ops per second : 390885079 op/s
      -- Other function --
      Total time : 43414435 ns
      Average runtime : 1.06096 ns/op
      Ops per second : 942543649 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.41131

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 43754769 ns
      Average runtime : 2.13855 ns/op
      Ops per second : 467606171 op/s
      -- Other function --
      Total time : 22720115 ns
      Average runtime : 1.11047 ns/op
      Ops per second : 900523610 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.92582

  • Google Tensor G3, Clang 17
    • ceilf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 20797526 ns 
           Average runtime : 0.732642 ns/op 
           Ops per second  : 1364923885 op/s 
      -- Other function --
           Total time      : 20580567 ns 
           Average runtime : 0.724999 ns/op 
           Ops per second  : 1379312824 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.01054 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 118553304 ns
      Average runtime : 0.729198 ns/op
      Ops per second : 1371368949 op/s
      -- Other function --
      Total time : 118554769 ns
      Average runtime : 0.729207 ns/op
      Ops per second : 1371352003 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.999988

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 100752523 ns
      Average runtime : 0.736635 ns/op
      Ops per second : 1357523523 op/s
      -- Other function --
      Total time : 100559734 ns
      Average runtime : 0.735226 ns/op
      Ops per second : 1360126111 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.00192

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 460694946 ns
      Average runtime : 0.686489 ns/op
      Ops per second : 1456687480 op/s
      -- Other function --
      Total time : 460728435 ns
      Average runtime : 0.686539 ns/op
      Ops per second : 1456581597 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.999927

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 230323649 ns
      Average runtime : 0.686418 ns/op
      Ops per second : 1456838155 op/s
      -- Other function --
      Total time : 230881755 ns
      Average runtime : 0.688081 ns/op
      Ops per second : 1453316568 op/s
      -- Average runtime ratio --
      Mine / Other's : 0.997583

    • ceilf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 148474691 ns 
           Average runtime : 1.83302 ns/op 
           Ops per second  : 545547523 op/s 
      -- Other function --
           Total time      : 62275025 ns 
           Average runtime : 0.768827 ns/op 
           Ops per second  : 1300681934 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.38418 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 182548543 ns
      Average runtime : 1.4488 ns/op
      Ops per second : 690227365 op/s
      -- Other function --
      Total time : 93034546 ns
      Average runtime : 0.738369 ns/op
      Ops per second : 1354335624 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.96216

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 69155965 ns
      Average runtime : 1.09771 ns/op
      Ops per second : 910984323 op/s
      -- Other function --
      Total time : 49448771 ns
      Average runtime : 0.784901 ns/op
      Ops per second : 1274045820 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.39854

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 84904704 ns
      Average runtime : 2.0749 ns/op
      Ops per second : 481952095 op/s
      -- Other function --
      Total time : 28540568 ns
      Average runtime : 0.697472 ns/op
      Ops per second : 1433748620 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.97488

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 28325887 ns
      Average runtime : 1.38445 ns/op
      Ops per second : 722307477 op/s
      -- Other function --
      Total time : 14238647 ns
      Average runtime : 0.695926 ns/op
      Ops per second : 1436934281 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.98937

    • roundf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 30589599 ns 
           Average runtime : 1.07759 ns/op 
           Ops per second  : 927996473 op/s 
      -- Other function --
           Total time      : 20566609 ns 
           Average runtime : 0.724507 ns/op 
           Ops per second  : 1380248926 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.48734 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 173745850 ns
      Average runtime : 1.06868 ns/op
      Ops per second : 935736421 op/s
      -- Other function --
      Total time : 118372558 ns
      Average runtime : 0.728087 ns/op
      Ops per second : 1373462927 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.46779

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 147161255 ns
      Average runtime : 1.07595 ns/op
      Ops per second : 929415286 op/s
      -- Other function --
      Total time : 100667643 ns
      Average runtime : 0.736015 ns/op
      Ops per second : 1358668147 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.46185

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 691060913 ns
      Average runtime : 1.02976 ns/op
      Ops per second : 971098997 op/s
      -- Other function --
      Total time : 460735067 ns
      Average runtime : 0.686549 ns/op
      Ops per second : 1456560631 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.49991

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 345502401 ns
      Average runtime : 1.02968 ns/op
      Ops per second : 971177852 op/s
      -- Other function --
      Total time : 230297688 ns
      Average runtime : 0.686341 ns/op
      Ops per second : 1457002382 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.50024

    • roundf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 176155843 ns 
           Average runtime : 2.17476 ns/op 
           Ops per second  : 459820115 op/s 
      -- Other function --
           Total time      : 61848186 ns 
           Average runtime : 0.763558 ns/op 
           Ops per second  : 1309658459 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.8482 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 222636678 ns
      Average runtime : 1.76696 ns/op
      Ops per second : 565944484 op/s
      -- Other function --
      Total time : 93846721 ns
      Average runtime : 0.744815 ns/op
      Ops per second : 1342614836 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.37234

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 73063680 ns
      Average runtime : 1.15974 ns/op
      Ops per second : 862261523 op/s
      -- Other function --
      Total time : 49564454 ns
      Average runtime : 0.786737 ns/op
      Ops per second : 1271072208 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.47411

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 94936808 ns
      Average runtime : 2.32006 ns/op
      Ops per second : 431023549 op/s
      -- Other function --
      Total time : 29073894 ns
      Average runtime : 0.710506 ns/op
      Ops per second : 1407448207 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.26536

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 35292236 ns
      Average runtime : 1.72494 ns/op
      Ops per second : 579730907 op/s
      -- Other function --
      Total time : 14242595 ns
      Average runtime : 0.696119 ns/op
      Ops per second : 1436535968 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.47794

    • roundevenf
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 50036743 ns 
           Average runtime : 1.76266 ns/op 
           Ops per second  : 567323896 op/s 
      -- Other function --
           Total time      : 30556437 ns 
           Average runtime : 1.07642 ns/op 
           Ops per second  : 929003600 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 1.63752 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 235388265 ns
      Average runtime : 1.44783 ns/op
      Ops per second : 690689996 op/s
      -- Other function --
      Total time : 174112752 ns
      Average runtime : 1.07093 ns/op
      Ops per second : 933764575 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.35193

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 196856730 ns
      Average runtime : 1.43929 ns/op
      Ops per second : 694789149 op/s
      -- Other function --
      Total time : 146919759 ns
      Average runtime : 1.07418 ns/op
      Ops per second : 930942991 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.33989

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 1563822226 ns
      Average runtime : 2.33028 ns/op
      Ops per second : 429133535 op/s
      -- Other function --
      Total time : 691996704 ns
      Average runtime : 1.03116 ns/op
      Ops per second : 969785775 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.25987

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 470601400 ns
      Average runtime : 1.4025 ns/op
      Ops per second : 713011648 op/s
      -- Other function --
      Total time : 346528483 ns
      Average runtime : 1.03274 ns/op
      Ops per second : 968302164 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.35805

    • roundevenf16
      Performance tests with inputs in normal integral range:
      -- My function --
           Total time      : 148951904 ns 
           Average runtime : 1.83891 ns/op 
           Ops per second  : 543799695 op/s 
      -- Other function --
           Total time      : 62186401 ns 
           Average runtime : 0.767733 ns/op 
           Ops per second  : 1302535581 op/s 
      -- Average runtime ratio --
           Mine / Other's  : 2.39525 
      

      Performance tests with inputs in low integral range:
      -- My function --
      Total time : 185217489 ns
      Average runtime : 1.46998 ns/op
      Ops per second : 680281331 op/s
      -- Other function --
      Total time : 94043905 ns
      Average runtime : 0.74638 ns/op
      Ops per second : 1339799745 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.96948

      Performance tests with inputs in high integral range:
      -- My function --
      Total time : 92689738 ns
      Average runtime : 1.47127 ns/op
      Ops per second : 679686892 op/s
      -- Other function --
      Total time : 49437094 ns
      Average runtime : 0.784716 ns/op
      Ops per second : 1274346748 op/s
      -- Average runtime ratio --
      Mine / Other's : 1.8749

      Performance tests with inputs in normal fractional range:
      -- My function --
      Total time : 102827799 ns
      Average runtime : 2.5129 ns/op
      Ops per second : 397946862 op/s
      -- Other function --
      Total time : 29049682 ns
      Average runtime : 0.709914 ns/op
      Ops per second : 1408621271 op/s
      -- Average runtime ratio --
      Mine / Other's : 3.53972

      Performance tests with inputs in subnormal fractional range:
      -- My function --
      Total time : 30628988 ns
      Average runtime : 1.49702 ns/op
      Ops per second : 667994646 op/s
      -- Other function --
      Total time : 14428263 ns
      Average runtime : 0.705194 ns/op
      Ops per second : 1418050114 op/s
      -- Average runtime ratio --
      Mine / Other's : 2.12285

@overmighty overmighty force-pushed the libc-math-rounding-generic-opt branch from 2df2699 to 781700f Compare July 11, 2024 14:48
@overmighty overmighty marked this pull request as ready for review July 11, 2024 14:50
@llvmbot llvmbot added the libc label Jul 11, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2024

@llvm/pr-subscribers-libc

Author: OverMighty (overmighty)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/98483.diff

3 Files Affected:

  • (modified) libc/src/__support/FPUtil/NearestIntegerOperations.h (+21-16)
  • (modified) libc/test/src/math/performance_testing/CMakeLists.txt (+19)
  • (added) libc/test/src/math/performance_testing/nearest_integer_funcs_perf.cpp (+168)
diff --git a/libc/src/__support/FPUtil/NearestIntegerOperations.h b/libc/src/__support/FPUtil/NearestIntegerOperations.h
index cff32938229d0..a9a0a97eebb5c 100644
--- a/libc/src/__support/FPUtil/NearestIntegerOperations.h
+++ b/libc/src/__support/FPUtil/NearestIntegerOperations.h
@@ -75,15 +75,17 @@ LIBC_INLINE T ceil(T x) {
   }
 
   uint32_t trim_size = FPBits<T>::FRACTION_LEN - exponent;
-  StorageType trunc_mantissa =
-      static_cast<StorageType>((bits.get_mantissa() >> trim_size) << trim_size);
-  bits.set_mantissa(trunc_mantissa);
-  T trunc_value = bits.get_val();
+  StorageType x_u = bits.uintval();
+  StorageType trunc_u =
+      static_cast<StorageType>((x_u >> trim_size) << trim_size);
 
   // If x is already an integer, return it.
-  if (trunc_value == x)
+  if (trunc_u == x_u)
     return x;
 
+  bits.set_uintval(trunc_u);
+  T trunc_value = bits.get_val();
+
   // If x is negative, the ceil operation is equivalent to the trunc operation.
   if (is_neg)
     return trunc_value;
@@ -130,15 +132,17 @@ LIBC_INLINE T round(T x) {
   uint32_t trim_size = FPBits<T>::FRACTION_LEN - exponent;
   bool half_bit_set =
       bool(bits.get_mantissa() & (StorageType(1) << (trim_size - 1)));
-  StorageType trunc_mantissa =
-      static_cast<StorageType>((bits.get_mantissa() >> trim_size) << trim_size);
-  bits.set_mantissa(trunc_mantissa);
-  T trunc_value = bits.get_val();
+  StorageType x_u = bits.uintval();
+  StorageType trunc_u =
+      static_cast<StorageType>((x_u >> trim_size) << trim_size);
 
   // If x is already an integer, return it.
-  if (trunc_value == x)
+  if (trunc_u == x_u)
     return x;
 
+  bits.set_uintval(trunc_u);
+  T trunc_value = bits.get_val();
+
   if (!half_bit_set) {
     // Franctional part is less than 0.5 so round value is the
     // same as the trunc value.
@@ -188,16 +192,17 @@ round_using_specific_rounding_mode(T x, int rnd) {
   }
 
   uint32_t trim_size = FPBits<T>::FRACTION_LEN - exponent;
-  FPBits<T> new_bits = bits;
-  StorageType trunc_mantissa =
-      static_cast<StorageType>((bits.get_mantissa() >> trim_size) << trim_size);
-  new_bits.set_mantissa(trunc_mantissa);
-  T trunc_value = new_bits.get_val();
+  StorageType x_u = bits.uintval();
+  StorageType trunc_u =
+      static_cast<StorageType>((x_u >> trim_size) << trim_size);
 
   // If x is already an integer, return it.
-  if (trunc_value == x)
+  if (trunc_u == x_u)
     return x;
 
+  FPBits<T> new_bits(trunc_u);
+  T trunc_value = new_bits.get_val();
+
   StorageType trim_value =
       bits.get_mantissa() &
       static_cast<StorageType>(((StorageType(1) << trim_size) - 1));
diff --git a/libc/test/src/math/performance_testing/CMakeLists.txt b/libc/test/src/math/performance_testing/CMakeLists.txt
index 4ea78f9999e4d..bf88fbb85c5d7 100644
--- a/libc/test/src/math/performance_testing/CMakeLists.txt
+++ b/libc/test/src/math/performance_testing/CMakeLists.txt
@@ -366,3 +366,22 @@ add_perf_binary(
   COMPILE_OPTIONS
     -fno-builtin
 )
+
+add_perf_binary(
+  nearest_integer_funcs_perf
+  SRCS
+    nearest_integer_funcs_perf.cpp
+  DEPENDS
+    libc.src.math.ceilf
+    libc.src.math.ceilf16
+    libc.src.math.floorf
+    libc.src.math.floorf16
+    libc.src.math.roundevenf
+    libc.src.math.roundevenf16
+    libc.src.math.roundf
+    libc.src.math.roundf16
+    libc.src.math.truncf
+    libc.src.math.truncf16
+  COMPILE_OPTIONS
+    -fno-builtin
+)
diff --git a/libc/test/src/math/performance_testing/nearest_integer_funcs_perf.cpp b/libc/test/src/math/performance_testing/nearest_integer_funcs_perf.cpp
new file mode 100644
index 0000000000000..24176a377e9d4
--- /dev/null
+++ b/libc/test/src/math/performance_testing/nearest_integer_funcs_perf.cpp
@@ -0,0 +1,168 @@
+//===-- Performance test for nearest integer functions --------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/math/ceilf.h"
+#include "src/math/ceilf16.h"
+#include "src/math/floorf.h"
+#include "src/math/floorf16.h"
+#include "src/math/roundevenf.h"
+#include "src/math/roundevenf16.h"
+#include "src/math/roundf.h"
+#include "src/math/roundf16.h"
+#include "src/math/truncf.h"
+#include "src/math/truncf16.h"
+#include "test/src/math/performance_testing/Timer.h"
+
+#include <fstream>
+#include <math.h>
+
+namespace LIBC_NAMESPACE::testing {
+
+template <typename T> class NearestIntegerPerf {
+  using FPBits = fputil::FPBits<T>;
+  using StorageType = typename FPBits::StorageType;
+
+public:
+  typedef T Func(T);
+
+  static void run_perf_in_range(Func my_func, Func other_func,
+                                StorageType starting_bit,
+                                StorageType ending_bit, StorageType step,
+                                size_t rounds, std::ofstream &log) {
+    auto runner = [=](Func func) {
+      volatile T result;
+      for (size_t i = 0; i < rounds; i++) {
+        for (StorageType bits = starting_bit; bits <= ending_bit;
+             bits += step) {
+          T x = FPBits(bits).get_val();
+          result = func(x);
+        }
+      }
+    };
+
+    Timer timer;
+    timer.start();
+    runner(my_func);
+    timer.stop();
+
+    size_t number_of_runs = (ending_bit - starting_bit) / step + 1;
+    double my_average =
+        static_cast<double>(timer.nanoseconds()) / number_of_runs / rounds;
+    log << "-- My function --\n";
+    log << "     Total time      : " << timer.nanoseconds() << " ns \n";
+    log << "     Average runtime : " << my_average << " ns/op \n";
+    log << "     Ops per second  : "
+        << static_cast<uint64_t>(1'000'000'000.0 / my_average) << " op/s \n";
+
+    timer.start();
+    runner(other_func);
+    timer.stop();
+
+    double other_average =
+        static_cast<double>(timer.nanoseconds()) / number_of_runs / rounds;
+    log << "-- Other function --\n";
+    log << "     Total time      : " << timer.nanoseconds() << " ns \n";
+    log << "     Average runtime : " << other_average << " ns/op \n";
+    log << "     Ops per second  : "
+        << static_cast<uint64_t>(1'000'000'000.0 / other_average) << " op/s \n";
+
+    log << "-- Average runtime ratio --\n";
+    log << "     Mine / Other's  : " << my_average / other_average << " \n";
+  }
+
+  static void run_perf(Func my_func, Func other_func, size_t rounds,
+                       const char *log_file) {
+    std::ofstream log(log_file);
+    log << "Performance tests with inputs in normal integral range:\n";
+    run_perf_in_range(
+        my_func, other_func,
+        /*starting_bit=*/StorageType((FPBits::EXP_BIAS + 1) << FPBits::SIG_LEN),
+        /*ending_bit=*/
+        StorageType((FPBits::EXP_BIAS + FPBits::FRACTION_LEN - 1)
+                    << FPBits::SIG_LEN),
+        /*step=*/StorageType(1 << FPBits::SIG_LEN),
+        rounds * FPBits::EXP_BIAS * FPBits::EXP_BIAS * 2, log);
+    log << "\n Performance tests with inputs in low integral range:\n";
+    run_perf_in_range(
+        my_func, other_func,
+        /*starting_bit=*/StorageType(1 << FPBits::SIG_LEN),
+        /*ending_bit=*/StorageType((FPBits::EXP_BIAS - 1) << FPBits::SIG_LEN),
+        /*step_bit=*/StorageType(1 << FPBits::SIG_LEN),
+        rounds * FPBits::EXP_BIAS * FPBits::EXP_BIAS * 2, log);
+    log << "\n Performance tests with inputs in high integral range:\n";
+    run_perf_in_range(
+        my_func, other_func,
+        /*starting_bit=*/
+        StorageType((FPBits::EXP_BIAS + FPBits::FRACTION_LEN)
+                    << FPBits::SIG_LEN),
+        /*ending_bit=*/
+        StorageType(FPBits::MAX_BIASED_EXPONENT << FPBits::SIG_LEN),
+        /*step=*/StorageType(1 << FPBits::SIG_LEN),
+        rounds * FPBits::EXP_BIAS * FPBits::EXP_BIAS * 2, log);
+    log << "\n Performance tests with inputs in normal fractional range:\n";
+    run_perf_in_range(
+        my_func, other_func,
+        /*starting_bit=*/
+        StorageType(((FPBits::EXP_BIAS + 1) << FPBits::SIG_LEN) + 1),
+        /*ending_bit=*/
+        StorageType(((FPBits::EXP_BIAS + 2) << FPBits::SIG_LEN) - 1),
+        /*step=*/StorageType(1), rounds * 2, log);
+    log << "\n Performance tests with inputs in subnormal fractional range:\n";
+    run_perf_in_range(my_func, other_func, /*starting_bit=*/StorageType(1),
+                      /*ending_bit=*/StorageType(FPBits::SIG_MASK),
+                      /*step=*/StorageType(1), rounds, log);
+  }
+};
+
+} // namespace LIBC_NAMESPACE::testing
+
+#define NEAREST_INTEGER_PERF(T, my_func, other_func, rounds, filename)         \
+  {                                                                            \
+    LIBC_NAMESPACE::testing::NearestIntegerPerf<T>::run_perf(                  \
+        &my_func, &other_func, rounds, filename);                              \
+    LIBC_NAMESPACE::testing::NearestIntegerPerf<T>::run_perf(                  \
+        &my_func, &other_func, rounds, filename);                              \
+  }
+
+static constexpr size_t FLOAT16_ROUNDS = 20'000;
+static constexpr size_t FLOAT_ROUNDS = 40;
+
+// LLVM libc might be the only libc implementation with support for float16 math
+// functions currently. We can't compare our float16 functions against the
+// system libc, so we compare them against this placeholder function.
+float16 placeholderf16(float16 x) { return x; }
+
+// The system libc might not provide the roundeven* C23 math functions either.
+float placeholderf(float x) { return x; }
+
+int main() {
+  NEAREST_INTEGER_PERF(float16, LIBC_NAMESPACE::ceilf16, ::placeholderf16,
+                       FLOAT16_ROUNDS, "ceilf16_perf.log")
+  NEAREST_INTEGER_PERF(float16, LIBC_NAMESPACE::floorf16, ::placeholderf16,
+                       FLOAT16_ROUNDS, "floorf16_perf.log")
+  NEAREST_INTEGER_PERF(float16, LIBC_NAMESPACE::roundevenf16, ::placeholderf16,
+                       FLOAT16_ROUNDS, "roundevenf16_perf.log")
+  NEAREST_INTEGER_PERF(float16, LIBC_NAMESPACE::roundf16, ::placeholderf16,
+                       FLOAT16_ROUNDS, "roundf16_perf.log")
+  NEAREST_INTEGER_PERF(float16, LIBC_NAMESPACE::truncf16, ::placeholderf16,
+                       FLOAT16_ROUNDS, "truncf16_perf.log")
+
+  NEAREST_INTEGER_PERF(float, LIBC_NAMESPACE::ceilf, ::ceilf, FLOAT_ROUNDS,
+                       "ceilf_perf.log")
+  NEAREST_INTEGER_PERF(float, LIBC_NAMESPACE::floorf, ::floorf, FLOAT_ROUNDS,
+                       "floorf_perf.log")
+  NEAREST_INTEGER_PERF(float, LIBC_NAMESPACE::roundevenf, ::placeholderf,
+                       FLOAT_ROUNDS, "roundevenf_perf.log")
+  NEAREST_INTEGER_PERF(float, LIBC_NAMESPACE::roundf, ::roundf, FLOAT_ROUNDS,
+                       "roundf_perf.log")
+  NEAREST_INTEGER_PERF(float, LIBC_NAMESPACE::truncf, ::truncf, FLOAT_ROUNDS,
+                       "truncf_perf.log")
+
+  return 0;
+}

@lntue lntue merged commit 621bcfc into llvm:main Jul 11, 2024
7 of 8 checks passed
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants