Handle intrinsics in a more efficent manner. #687
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current implementation of intrinsics is very unoptimized.
In Rust, a match on string gets compiled down to what is effectively an if ladder(maybe we should consider opening an upstream issue about this). This is crazy inefficient, both in terms of the number of basic blocks(and thus compile times), and in the number of comparisons required to match a string(example: matching the 1000 stting will require 1000 comparisons).
The sheer amount of comparisons in
src::intrinsics::llvm::intrinsics
triggers a GCC bug. While trying to recurse on the basic block, GCC overflows its stack.This PR splits that string matching into a couple of functions, dedicated to specific architectures(e.g. ARM) or extensions(e.g. AVX).
This brings both runtime improvements(less comparisons needed) and pretty significant compiletime improvements.
In the master branch, the function in question is the heaviest one in terms of generated LLVM IR, and by a wide margin.
With the patch, the problematic functions are still complex, but are a bit more managable.
On a debug build, this PR reduced build times by ~30 %.
Debug without the patch:
Debug with the patch:
In release, the difference is an over 3x reduction in build times!
Release without patch:
Release with patch:
We still have some tradeoffs to consider. Taking into account the laughably poor codegen for such a match(in both LLVM and GCC), we might just consider using a hash map. There is a very high chance it would have much better runtime & compiletime performance anyway.