-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing early clobber #766
Comments
My understanding of the GCC ASM extensions is very limited but I think you're right. Would you be willing to open a PR? |
Actually, I was not using the code as is, so I don't have the setup to test the issue in the current code itself. I actually found this bug because I was re-using code from the library in a small home project, where I have field elements not using nails (packed in 32 bytes, like the scalars). I do this in my project because I mainly want to use the GPU to do some computations and it help using packed binary representation (I use secp256k1 code mainly as a 'helper', the bulk of the computation will be GPU). So, I had to modify the field code to look like the scalar code, but using modulo P instead of modulo N. Most code mainly just involved using P instead of N as a constant, but I had to re-write the asm for the 512-bit reduction (much simpler as you only have one 64-bit single-precision multiplication). However, at some point, I hit the issue with the clobber list as it was defined in scalar code (segfault). Looking at asm output, I saw that the compiler did put the 'extract to m2' to %rsi, and since %rsi was used as indexed memory source for the next instruction, this caused segfault. Just defining m0~m2 output list as early-clobber (since the %rsi input is not 'consumed' before those outputs are set) fixed the issue. |
Well, neither do I (or anyone else here). We haven't seen bugs due to this but I guess then we were just lucky that compilers got it right arbitrarily.
Can I ask what your project is about? I'm just curious, it's not related to this issue. |
In the existing code, the compiler is allowed to allocate the RSI register for outputs m0, m1, or m2, which are written to before the input in RSI is read from. Fix this by marking them as early clobber. Reported by ehoffman2 in bitcoin-core#766
In the existing code, the compiler is allowed to allocate the RSI register for outputs m0, m1, or m2, which are written to before the input in RSI is read from. Fix this by marking them as early clobber. Reported by ehoffman2 in bitcoin-core#766
In the existing code, the compiler is allowed to allocate the RSI register for outputs m0, m1, or m2, which are written to before the input in RSI is read from. Fix this by marking them as early clobber. Reported by ehoffman2 in bitcoin-core#766
In the existing code, the compiler is allowed to allocate the RSI register for outputs m0, m1, or m2, which are written to before the input in RSI is read from. Fix this by marking them as early clobber. Reported by ehoffman2 in bitcoin-core/secp256k1#766
In the existing code, the compiler is allowed to allocate the RSI register for outputs m0, m1, or m2, which are written to before the input in RSI is read from. Fix this by marking them as early clobber. Reported by ehoffman2 in bitcoin-core#766
In the existing code, the compiler is allowed to allocate the RSI register for outputs m0, m1, or m2, which are written to before the input in RSI is read from. Fix this by marking them as early clobber. Reported by ehoffman2 in bitcoin-core#766
In the existing code, the compiler is allowed to allocate the RSI register for outputs m0, m1, or m2, which are written to before the input in RSI is read from. Fix this by marking them as early clobber. Reported by ehoffman2 in bitcoin-core#766
In scalar_4x64_impl.h, function secp256k1_scalar_reduce_512. The first asm block is missing early clobber for m0~m2.
The input %rsi is used after writing to m0, m1 and m2. Any of those write can thus re-use %rsi (which will most likely result in segfault).
Also, field_5x52_asm_impl.h is missing similar early clobbers for tmp1~tmp3
The text was updated successfully, but these errors were encountered: