Handle intrinsics in a more efficent manner. #687

FractalFir · 2025-05-27T08:40:13Z

The current implementation of intrinsics is very unoptimized.

In Rust, a match on string gets compiled down to what is effectively an if ladder(maybe we should consider opening an upstream issue about this). This is crazy inefficient, both in terms of the number of basic blocks(and thus compile times), and in the number of comparisons required to match a string(example: matching the 1000 stting will require 1000 comparisons).

The sheer amount of comparisons in src::intrinsics::llvm::intrinsics triggers a GCC bug. While trying to recurse on the basic block, GCC overflows its stack.

This PR splits that string matching into a couple of functions, dedicated to specific architectures(e.g. ARM) or extensions(e.g. AVX).

This brings both runtime improvements(less comparisons needed) and pretty significant compiletime improvements.

In the master branch, the function in question is the heaviest one in terms of generated LLVM IR, and by a wide margin.

93555 (13.3%, 13.3%)     1 (0.0%,  0.0%)  rustc_codegen_gcc::intrinsic::llvm::intrinsic
   10354 (1.5%, 14.8%)     62 (0.4%,  0.5%)  alloc::vec::in_place_collect::from_iter_in_place

With the patch, the problematic functions are still complex, but are a bit more managable.

   16112 (2.4%,  2.4%)      1 (0.0%,  0.0%)  rustc_codegen_gcc::intrinsic::llvm::map_arch_intrinsic::x86
   12404 (1.9%,  4.3%)      1 (0.0%,  0.0%)  rustc_codegen_gcc::intrinsic::llvm::map_arch_intrinsic::hexagon

On a debug build, this PR reduced build times by ~30 %.

Debug without the patch:

Finished `dev` profile [optimized + debuginfo] target(s) in 10.26s

Debug with the patch:

Finished `dev` profile [optimized + debuginfo] target(s) in 7.59s

In release, the difference is an over 3x reduction in build times!
Release without patch:

    Finished `release` profile [optimized] target(s) in 31.02s

Release with patch:

   Finished `release` profile [optimized] target(s) in 8.33s

We still have some tradeoffs to consider. Taking into account the laughably poor codegen for such a match(in both LLVM and GCC), we might just consider using a hash map. There is a very high chance it would have much better runtime & compiletime performance anyway.

GuillaumeGomez · 2025-05-27T09:13:02Z

Love the idea! Funnily enough that's what I suggested to @antoyo when opened the issue about the too big match.

…ics are now split into separate, architecture-specific functions.

antoyo · 2025-05-27T12:49:53Z

tools/generate_intrinsics.py

@@ -168,25 +168,39 @@ def update_intrinsics(llvm_path, llvmint, llvmint2):
        os.path.dirname(os.path.abspath(__file__)),
        "../src/intrinsic/archs.rs",
    )
+    # A hashmap of all architectures. This allows us to first match on the architecture, and then on the intrisnics. 


Could you please put the intrinsic generation in another commit?

Fix to 128 bit int unaligned loads

5a58ddf

GuillaumeGomez approved these changes May 27, 2025

View reviewed changes

FractalFir force-pushed the better_intrinsics branch from 32f8d9c to 4cb188f Compare May 27, 2025 09:30

Changed the code generating platform-specific intrinsics. The intrins…

a579de2

…ics are now split into separate, architecture-specific functions.

FractalFir force-pushed the better_intrinsics branch from 4cb188f to a579de2 Compare May 27, 2025 10:03

antoyo reviewed May 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle intrinsics in a more efficent manner. #687

Handle intrinsics in a more efficent manner. #687

Uh oh!

FractalFir commented May 27, 2025

Uh oh!

GuillaumeGomez commented May 27, 2025

Uh oh!

antoyo May 27, 2025

Uh oh!

Uh oh!

Handle intrinsics in a more efficent manner. #687

Are you sure you want to change the base?

Handle intrinsics in a more efficent manner. #687

Uh oh!

Conversation

FractalFir commented May 27, 2025

Uh oh!

GuillaumeGomez commented May 27, 2025

Uh oh!

antoyo May 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!