-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sha2: performance issue on RISC-V #328
Comments
Maybe it's similar to rust-lang/rust#88930? Can you check if changing |
No, changing |
The only way I could find to force the desired behavior is by making the array index a runtime operation (i.e. changing the macro into an Also I wonder what the current software implementation is optimizing for? I'd guess it was written for auto vectorization by the backend? |
I think it's worth to create a Rust issue, ideally with a godbolt link, so it would be easy to check whether the issue got fixed in a new compiler release or not. Later when LLVM will finish migration of its issue tracker to GitHub, it could be worth to duplicate it there as well.
The current software backend was not optimized for anything particular, but it was heavily influenced by the SHA-NI backend. |
Okay thank you. This is very likely not fixed in a new compiler release since I'm on a fairly recent nightly build. But I'll additionally try to reproduce+minimize in Godbolt. |
I found out that However, |
@piegamesde |
I found a somewhat hacky way to work around this by using |
I'm compiling the crate on a
riscv32im-unknown-none-elf
. On a release build withopt-level = 3
andlto = fat
, thesha2::sha256::soft::compress
function looks like this:riscv32-none-elf-objdump -Cd
The notable thing about this is how the keys (from
K32
) are handled. At the moment, the loop starts by copying all of themonto the stack for some reason. This has several disadvantages:
take only two instructions. Therefore, at least two/three loads are needed to break even (instruction wise). However, as
far as I can tell every stack offset is only read once.
whereas only once would be needed.
I'd suggest doing one of the following:
asm
implementation targettingRISC-V
Note that these problem are rather specific to RISC-V 32 bit and Sha256. I have not checked other configurations, but I'm rather
certain that the different data sizes will lead to different tradeoffs between the options.
The text was updated successfully, but these errors were encountered: