Skip to content

Handle intrinsics in a more efficent manner. #687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

FractalFir
Copy link
Contributor

The current implementation of intrinsics is very unoptimized.

In Rust, a match on string gets compiled down to what is effectively an if ladder(maybe we should consider opening an upstream issue about this). This is crazy inefficient, both in terms of the number of basic blocks(and thus compile times), and in the number of comparisons required to match a string(example: matching the 1000 stting will require 1000 comparisons).

The sheer amount of comparisons in src::intrinsics::llvm::intrinsics triggers a GCC bug. While trying to recurse on the basic block, GCC overflows its stack.

This PR splits that string matching into a couple of functions, dedicated to specific architectures(e.g. ARM) or extensions(e.g. AVX).

This brings both runtime improvements(less comparisons needed) and pretty significant compiletime improvements.

In the master branch, the function in question is the heaviest one in terms of generated LLVM IR, and by a wide margin.

93555 (13.3%, 13.3%)     1 (0.0%,  0.0%)  rustc_codegen_gcc::intrinsic::llvm::intrinsic
   10354 (1.5%, 14.8%)     62 (0.4%,  0.5%)  alloc::vec::in_place_collect::from_iter_in_place

With the patch, the problematic functions are still complex, but are a bit more managable.

   16112 (2.4%,  2.4%)      1 (0.0%,  0.0%)  rustc_codegen_gcc::intrinsic::llvm::map_arch_intrinsic::x86
   12404 (1.9%,  4.3%)      1 (0.0%,  0.0%)  rustc_codegen_gcc::intrinsic::llvm::map_arch_intrinsic::hexagon

On a debug build, this PR reduced build times by ~30 %.

Debug without the patch:

Finished `dev` profile [optimized + debuginfo] target(s) in 10.26s

Debug with the patch:

Finished `dev` profile [optimized + debuginfo] target(s) in 7.59s

In release, the difference is an over 3x reduction in build times!
Release without patch:

    Finished `release` profile [optimized] target(s) in 31.02s

Release with patch:

   Finished `release` profile [optimized] target(s) in 8.33s

We still have some tradeoffs to consider. Taking into account the laughably poor codegen for such a match(in both LLVM and GCC), we might just consider using a hash map. There is a very high chance it would have much better runtime & compiletime performance anyway.

@GuillaumeGomez
Copy link
Member

Love the idea! Funnily enough that's what I suggested to @antoyo when opened the issue about the too big match.

@FractalFir FractalFir force-pushed the better_intrinsics branch from 32f8d9c to 4cb188f Compare May 27, 2025 09:30
…ics are now split into separate, architecture-specific functions.
@FractalFir FractalFir force-pushed the better_intrinsics branch from 4cb188f to a579de2 Compare May 27, 2025 10:03
@@ -168,25 +168,39 @@ def update_intrinsics(llvm_path, llvmint, llvmint2):
os.path.dirname(os.path.abspath(__file__)),
"../src/intrinsic/archs.rs",
)
# A hashmap of all architectures. This allows us to first match on the architecture, and then on the intrisnics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please put the intrinsic generation in another commit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants