Skip to content
This repository was archived by the owner on Apr 28, 2025. It is now read-only.

Mark generic functions #[inline] #545

Merged
merged 1 commit into from
Apr 18, 2025
Merged

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented Apr 18, 2025

Benchmarks for 1 seemed to indicate that repository organization for
some reason had an effect on performance, even though the exact same
rustc commands were running (though some with a different order). After
investigating more, it appears that dependencies may have an affect on
inlining thresholds for generic functions.

It is surprising that this happens, we more or less expect that public
functions will be standalone but everything they call will be inlined.
To help ensure this, mark all generic functions #[inline] if they
should be merged into the public function.

Zulip discussion at 2.

ci: skip-extensive

@tgross35
Copy link
Contributor Author

tgross35 commented Apr 18, 2025

Most notable changes from softfloat:

icount::icount_bench_acos_group::icount_bench_acos logspace:setup_acos()
  Baselines:                      softfloat|softfloat
  Instructions:                       11425|11427                (-0.01750%) [-1.00018x]
  L1 Hits:                            13709|13710                (-0.00729%) [-1.00007x]
  L2 Hits:                                5|5                    (No change)
  RAM Hits:                              34|34                   (No change)
  Total read+write:                   13748|13749                (-0.00727%) [-1.00007x]
  Estimated Cycles:                   14924|14925                (-0.00670%) [-1.00007x]
icount::icount_bench_acosf_group::icount_bench_acosf logspace:setup_acosf()
  Baselines:                      softfloat|softfloat
  Instructions:                       11982|11988                (-0.05005%) [-1.00050x]
  L1 Hits:                            14695|14698                (-0.02041%) [-1.00020x]
  L2 Hits:                                4|3                    (+33.3333%) [+1.33333x]
  RAM Hits:                              20|21                   (-4.76190%) [-1.05000x]
  Total read+write:                   14719|14722                (-0.02038%) [-1.00020x]
  Estimated Cycles:                   15415|15448                (-0.21362%) [-1.00214x]
icount::icount_bench_acosh_group::icount_bench_acosh logspace:setup_acosh()
  Baselines:                      softfloat|softfloat
  Instructions:                       41722|41748                (-0.06228%) [-1.00062x]
  L1 Hits:                            50370|50381                (-0.02183%) [-1.00022x]
  L2 Hits:                                4|4                    (No change)
  RAM Hits:                              35|37                   (-5.40541%) [-1.05714x]
  Total read+write:                   50409|50422                (-0.02578%) [-1.00026x]
  Estimated Cycles:                   51615|51696                (-0.15669%) [-1.00157x]
icount::icount_bench_acoshf_group::icount_bench_acoshf logspace:setup_acoshf()
  Baselines:                      softfloat|softfloat
  Instructions:                       37245|37339                (-0.25175%) [-1.00252x]
  L1 Hits:                            44267|44310                (-0.09704%) [-1.00097x]
  L2 Hits:                                5|6                    (-16.6667%) [-1.20000x]
  RAM Hits:                              30|33                   (-9.09091%) [-1.10000x]
  Total read+write:                   44302|44349                (-0.10598%) [-1.00106x]
  Estimated Cycles:                   45342|45495                (-0.33630%) [-1.00337x]
icount::icount_bench_asin_group::icount_bench_asin logspace:setup_asin()
  Baselines:                      softfloat|softfloat
  Instructions:                       12567|12569                (-0.01591%) [-1.00016x]
  L1 Hits:                            16236|16237                (-0.00616%) [-1.00006x]
  L2 Hits:                                4|5                    (-20.0000%) [-1.25000x]
  RAM Hits:                              32|31                   (+3.22581%) [+1.03226x]
  Total read+write:                   16272|16273                (-0.00615%) [-1.00006x]
  Estimated Cycles:                   17376|17347                (+0.16718%) [+1.00167x]
icount::icount_bench_asinf_group::icount_bench_asinf logspace:setup_asinf()
  Baselines:                      softfloat|softfloat
  Instructions:                       11627|11633                (-0.05158%) [-1.00052x]
  L1 Hits:                            14984|14990                (-0.04003%) [-1.00040x]
  L2 Hits:                                4|3                    (+33.3333%) [+1.33333x]
  RAM Hits:                              23|21                   (+9.52381%) [+1.09524x]
  Total read+write:                   15011|15014                (-0.01998%) [-1.00020x]
  Estimated Cycles:                   15809|15740                (+0.43837%) [+1.00438x]
icount::icount_bench_asinh_group::icount_bench_asinh logspace:setup_asinh()
  Baselines:                      softfloat|softfloat
  Instructions:                       31701|31725                (-0.07565%) [-1.00076x]
  L1 Hits:                            39427|39439                (-0.03043%) [-1.00030x]
  L2 Hits:                                4|4                    (No change)
  RAM Hits:                              42|42                   (No change)
  Total read+write:                   39473|39485                (-0.03039%) [-1.00030x]
  Estimated Cycles:                   40917|40929                (-0.02932%) [-1.00029x]
icount::icount_bench_asinhf_group::icount_bench_asinhf logspace:setup_asinhf()
  Baselines:                      softfloat|softfloat
  Instructions:                       27780|27872                (-0.33008%) [-1.00331x]
  L1 Hits:                            33563|33603                (-0.11904%) [-1.00119x]
  L2 Hits:                                4|8                    (-50.0000%) [-2.00000x]
  RAM Hits:                              34|36                   (-5.55556%) [-1.05882x]
  Total read+write:                   33601|33647                (-0.13671%) [-1.00137x]
  Estimated Cycles:                   34773|34903                (-0.37246%) [-1.00374x]
icount::icount_bench_fdimf128_group::icount_bench_fdimf128 logspace:setup_fdimf128()
  Baselines:                      softfloat|softfloat
  Instructions:                       93613|94648                (-1.09353%) [-1.01106x]
  L1 Hits:                           122570|123608               (-0.83975%) [-1.00847x]
  L2 Hits:                                6|3                    (+100.000%) [+2.00000x]
  RAM Hits:                              27|27                   (No change)
  Total read+write:                  122603|123638               (-0.83712%) [-1.00844x]
  Estimated Cycles:                  123545|124568               (-0.82124%) [-1.00828x]
icount::icount_bench_fmaf128_group::icount_bench_fmaf128 logspace:setup_fmaf128()
  Baselines:                      softfloat|softfloat
  Instructions:                      175119|169007               (+3.61642%) [+1.03616x]
  L1 Hits:                           229285|223656               (+2.51681%) [+1.02517x]
  L2 Hits:                               42|58                   (-27.5862%) [-1.38095x]
  RAM Hits:                              67|64                   (+4.68750%) [+1.04688x]
  Total read+write:                  229394|223778               (+2.50963%) [+1.02510x]
  Estimated Cycles:                  231840|226186               (+2.49971%) [+1.02500x]
icount::icount_bench_fmod_group::icount_bench_fmod logspace:setup_fmod()
  Baselines:                      softfloat|softfloat
  Instructions:                     1102967|1103496              (-0.04794%) [-1.00048x]
  L1 Hits:                          1105167|1105693              (-0.04757%) [-1.00048x]
  L2 Hits:                                2|4                    (-50.0000%) [-2.00000x]
  RAM Hits:                              10|11                   (-9.09091%) [-1.10000x]
  Total read+write:                 1105179|1105708              (-0.04784%) [-1.00048x]
  Estimated Cycles:                 1105527|1106098              (-0.05162%) [-1.00052x]
icount::icount_bench_fmodf128_group::icount_bench_fmodf128 logspace:setup_fmodf128()
  Baselines:                      softfloat|softfloat
  Instructions:                    31329198|31329727             (-0.00169%) [-1.00002x]
  L1 Hits:                         31363317|31363848             (-0.00169%) [-1.00002x]
  L2 Hits:                                8|5                    (+60.0000%) [+1.60000x]
  RAM Hits:                              38|39                   (-2.56410%) [-1.02632x]
  Total read+write:                31363363|31363892             (-0.00169%) [-1.00002x]
  Estimated Cycles:                31364687|31365238             (-0.00176%) [-1.00002x]
icount::icount_bench_fmodf16_group::icount_bench_fmodf16 logspace:setup_fmodf16()
  Baselines:                      softfloat|softfloat
  Instructions:                       84006|84535                (-0.62578%) [-1.00630x]
  L1 Hits:                           100647|101171               (-0.51793%) [-1.00521x]
  L2 Hits:                                1|6                    (-83.3333%) [-6.00000x]
  RAM Hits:                              16|16                   (No change)
  Total read+write:                  100664|101193               (-0.52276%) [-1.00526x]
  Estimated Cycles:                  101212|101761               (-0.53950%) [-1.00542x]
icount::icount_bench_fmodf_group::icount_bench_fmodf logspace:setup_fmodf()
  Baselines:                      softfloat|softfloat
  Instructions:                      186691|187220               (-0.28256%) [-1.00283x]
  L1 Hits:                           188893|189417               (-0.27664%) [-1.00277x]
  L2 Hits:                                1|5                    (-80.0000%) [-5.00000x]
  RAM Hits:                               9|10                   (-10.0000%) [-1.11111x]
  Total read+write:                  188903|189432               (-0.27926%) [-1.00280x]
  Estimated Cycles:                  189213|189792               (-0.30507%) [-1.00306x]
 icount::icount_bench_hypot_group::icount_bench_hypot logspace:setup_hypot()
  Baselines:                      softfloat|softfloat
  Instructions:                       24401|24485                (-0.34307%) [-1.00344x]
  L1 Hits:                            27026|27066                (-0.14779%) [-1.00148x]
  L2 Hits:                                5|7                    (-28.5714%) [-1.40000x]
  RAM Hits:                              24|24                   (No change)
  Total read+write:                   27055|27097                (-0.15500%) [-1.00155x]
  Estimated Cycles:                   27891|27941                (-0.17895%) [-1.00179x]
icount::icount_bench_hypotf_group::icount_bench_hypotf logspace:setup_hypotf()
  Baselines:                      softfloat|softfloat
  Instructions:                       29860|30104                (-0.81052%) [-1.00817x]
  L1 Hits:                            33206|33323                (-0.35111%) [-1.00352x]
  L2 Hits:                                4|7                    (-42.8571%) [-1.75000x]
  RAM Hits:                              18|20                   (-10.0000%) [-1.11111x]
  Total read+write:                   33228|33350                (-0.36582%) [-1.00367x]
  Estimated Cycles:                   33856|34058                (-0.59311%) [-1.00597x]
icount::icount_bench_powf_group::icount_bench_powf logspace:setup_powf()
  Baselines:                      softfloat|softfloat
  Instructions:                       57073|58550                (-2.52263%) [-1.02588x]
  L1 Hits:                            64673|66146                (-2.22689%) [-1.02278x]
  L2 Hits:                                6|8                    (-25.0000%) [-1.33333x]
  RAM Hits:                              31|33                   (-6.06061%) [-1.06452x]
  Total read+write:                   64710|66187                (-2.23156%) [-1.02282x]
  Estimated Cycles:                   65788|67341                (-2.30617%) [-1.02361x]
 icount::icount_bench_sqrt_group::icount_bench_sqrt logspace:setup_sqrt()
  Baselines:                      softfloat|softfloat
  Instructions:                       41641|43141                (-3.47697%) [-1.03602x]
  L1 Hits:                            44219|45715                (-3.27245%) [-1.03383x]
  L2 Hits:                                2|4                    (-50.0000%) [-2.00000x]
  RAM Hits:                              14|16                   (-12.5000%) [-1.14286x]
  Total read+write:                   44235|45735                (-3.27976%) [-1.03391x]
  Estimated Cycles:                   44719|46295                (-3.40426%) [-1.03524x]
icount::icount_bench_sqrtf128_group::icount_bench_sqrtf128 logspace:setup_sqrtf128()
  Baselines:                      softfloat|softfloat
  Instructions:                      240864|248361               (-3.01859%) [-1.03113x]
  L1 Hits:                           307816|318807               (-3.44754%) [-1.03571x]
  L2 Hits:                                2|5                    (-60.0000%) [-2.50000x]
  RAM Hits:                              38|39                   (-2.56410%) [-1.02632x]
  Total read+write:                  307856|318851               (-3.44832%) [-1.03571x]
  Estimated Cycles:                  309156|320197               (-3.44819%) [-1.03571x]
 icount::icount_bench_sqrtf_group::icount_bench_sqrtf logspace:setup_sqrtf()
  Baselines:                      softfloat|softfloat
  Instructions:                       35165|36665                (-4.09110%) [-1.04266x]
  L1 Hits:                            37746|39241                (-3.80979%) [-1.03961x]
  L2 Hits:                                2|3                    (-33.3333%) [-1.50000x]
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                   37760|39260                (-3.82068%) [-1.03972x]
  Estimated Cycles:                   38176|39816                (-4.11895%) [-1.04296x]

Hardfloat

icount::icount_bench_fdimf128_group::icount_bench_fdimf128 logspace:setup_fdimf128()
  Baselines:                      hardfloat|hardfloat
  Instructions:                       93613|94648                (-1.09353%) [-1.01106x]
  L1 Hits:                           122570|123607               (-0.83895%) [-1.00846x]
  L2 Hits:                                5|4                    (+25.0000%) [+1.25000x]
  RAM Hits:                              28|27                   (+3.70370%) [+1.03704x]
  Total read+write:                  122603|123638               (-0.83712%) [-1.00844x]
  Estimated Cycles:                  123575|124572               (-0.80034%) [-1.00807x]
 icount::icount_bench_fmaf128_group::icount_bench_fmaf128 logspace:setup_fmaf128()
  Baselines:                      hardfloat|hardfloat
  Instructions:                      175279|182151               (-3.77269%) [-1.03921x]
  L1 Hits:                           229445|239883               (-4.35129%) [-1.04549x]
  L2 Hits:                               45|58                   (-22.4138%) [-1.28889x]
  RAM Hits:                              64|69                   (-7.24638%) [-1.07812x]
  Total read+write:                  229554|240010               (-4.35649%) [-1.04555x]
  Estimated Cycles:                  231910|242588               (-4.40170%) [-1.04604x]
 icount::icount_bench_fmod_group::icount_bench_fmod logspace:setup_fmod()
  Baselines:                      hardfloat|hardfloat
  Instructions:                     1102967|1103496              (-0.04794%) [-1.00048x]
  L1 Hits:                          1105166|1105694              (-0.04775%) [-1.00048x]
  L2 Hits:                                4|4                    (No change)
  RAM Hits:                               9|10                   (-10.0000%) [-1.11111x]
  Total read+write:                 1105179|1105708              (-0.04784%) [-1.00048x]
  Estimated Cycles:                 1105501|1106064              (-0.05090%) [-1.00051x]
icount::icount_bench_fmodf128_group::icount_bench_fmodf128 logspace:setup_fmodf128()
  Baselines:                      hardfloat|hardfloat
  Instructions:                    31329198|31329727             (-0.00169%) [-1.00002x]
  L1 Hits:                         31363317|31363847             (-0.00169%) [-1.00002x]
  L2 Hits:                                8|5                    (+60.0000%) [+1.60000x]
  RAM Hits:                              38|40                   (-5.00000%) [-1.05263x]
  Total read+write:                31363363|31363892             (-0.00169%) [-1.00002x]
  Estimated Cycles:                31364687|31365272             (-0.00187%) [-1.00002x]
 icount::icount_bench_fmodf_group::icount_bench_fmodf logspace:setup_fmodf()
  Baselines:                      hardfloat|hardfloat
  Instructions:                      186691|187220               (-0.28256%) [-1.00283x]
  L1 Hits:                           188891|189421               (-0.27980%) [-1.00281x]
  L2 Hits:                                3|2                    (+50.0000%) [+1.50000x]
  RAM Hits:                               9|9                    (No change)
  Total read+write:                  188903|189432               (-0.27926%) [-1.00280x]
  Estimated Cycles:                  189221|189746               (-0.27669%) [-1.00277x]
icount::icount_bench_scalbnf16_group::icount_bench_scalbnf16 logspace:setup_scalbnf16()
  Baselines:                      hardfloat|hardfloat
  Instructions:                      129895|130930               (-0.79050%) [-1.00797x]
  L1 Hits:                           160793|161830               (-0.64080%) [-1.00645x]
  L2 Hits:                                4|2                    (+100.000%) [+2.00000x]
  RAM Hits:                              13|13                   (No change)
  Total read+write:                  160810|161845               (-0.63950%) [-1.00644x]
  Estimated Cycles:                  161268|162295               (-0.63280%) [-1.00637x]
icount::icount_bench_sqrtf128_group::icount_bench_sqrtf128 logspace:setup_sqrtf128()
  Baselines:                      hardfloat|hardfloat
  Instructions:                      241865|248863               (-2.81199%) [-1.02893x]
  L1 Hits:                           308817|319310               (-3.28615%) [-1.03398x]
  L2 Hits:                                5|3                    (+66.6667%) [+1.66667x]
  RAM Hits:                              39|44                   (-11.3636%) [-1.12821x]
  Total read+write:                  308861|319357               (-3.28660%) [-1.03398x]
  Estimated Cycles:                  310207|320865               (-3.32165%) [-1.03436x]

Benchmarks for [1] seemed to indicate that repository organization for
some reason had an effect on performance, even though the exact same
rustc commands were running (though some with a different order). After
investigating more, it appears that dependencies may have an affect on
inlining thresholds for generic functions.

It is surprising that this happens, we more or less expect that public
functions will be standalone but everything they call will be inlined.
To help ensure this, mark all generic functions `#[inline]` if they
should be merged into the public function.

Zulip discussion at [2].

[1]: rust-lang#533
[2]: https://rust-lang.zulipchat.com/#narrow/channel/182449-t-compiler.2Fhelp/topic/Dependencies.20affecting.20codegen/with/513079387
@tgross35 tgross35 merged commit 9b25961 into rust-lang:master Apr 18, 2025
35 checks passed
@tgross35 tgross35 deleted the inline-generic branch April 18, 2025 20:29
@LegNeato
Copy link

I'm still investigating to be sure, but I believe this broke rust-gpu:

Rust-GPU/rust-gpu#242

@tgross35
Copy link
Contributor Author

Sorry there is some breakage here. I am doubtful if this was the reason though because most of the generic symbols didn’t exist at the previous release, almost all were added and marked inline between the two.

I don’t exactly understand the problem, is rust-gpu looking for specific symbols in libm? We don’t mark any no_mangle.

Also you should create a new issue in rust-lang/compiler-builtins, this repo is getting archived in the very near future since the two got combined.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants