-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New code-gen options for retpolines and straight line speculation #51665
Comments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 for GCC cross-request. |
It looks like GCC has added support for -mindirect-branch-cs-prefix: https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=2196a681d7810ad8b227bf983f38ba716620545e This is being used when available in the Linux kernel: https://lore.kernel.org/lkml/20211118185421.GK174703@worktop.programming.kicks-ass.net/ |
Is there documentation somewhere describing this mitigation? In particular:
|
For AMD, it is discussed here https://developer.amd.com/wp-content/resources/Managing-Speculation-on-AMD-Processors.pdf, mitigation G-5 on the final page: Place an LFENCE after an indirect branch instruction (RET, JMP reg or mem, For Intel, notes are included in SDM Vol2 for the CALL and JMP instructions: Certain situations may lead to the next sequential instruction after a
As you can see, LFENCE is the official recommendation. It is about the only option for halting speculation which is safe to actually execute, and don't otherwise impact program state. CALL has architectural execution following it. However, the code following a CALL instruction is typically preservation of the return value and a pile of dead registers wanting reloading, and is typically not a pointer deference involving a callee-clobbered register. Therefore, CALL's are unlikely to have subsequent instructions which are vulnerable to speculative type confusion, and are therefore uninteresting to protect. JMP and RET are different. They are followed by arbitrary unrelated basic blocks, which could contain anything. We could use LFENCE everywhere. However, as we don't architecturally execute the instruction, we don't care about architectural side effects. Basically any instruction which causes a decode exception, or is microcoded, halts speculation. INT3 is safe to use, and is 1/3 of the length of LFENCE, so has less of an impact on code size. |
I'm a bit skeptical of heuristics like this; it's making very specific assumptions about how the compiler generates code, which might not hold for different codebases and/or optimizations.
It looks like the current version of Intel manual actually explicitly mentions INT3, so I guess that's fine. |
My gut feeling is that anyone wanting protection in the CALL case would probably be using Speculative Load Hardening instead. |
Extended Description
Hello
[FYI, this is being cross-requested of GCC too]
Linux and other kernel level software makes use of
-mindirect-branch=thunk-extern
to be able to alter the handling of indirect branches at boot. It turns out to be advantageous to inline the thunks when retpoline is not in use. https://lore.kernel.org/lkml/20211026120132.613201817@infradead.org/ is some infrastructure to make this work.In some cases, we want to be able to inline an
lfence; jmp *%reg
thunk. This is fine for the low 8 registers, but not fine for %r{8..15} where the REX prefix pushes the replacement size to being 6 bytes.It would be very useful to have a code-gen option to write out
call %cs:__x86_indirect_thunk_r{8..15}
where the redundant %cs prefix will increase the instruction length to 6, allowing the non-retpoline form to be inlined.Relatedly, x86 straight line speculation has been discussed before, but without any action taken. It would be helpful to have a code gen option which would emit
int3
following anyret
instruction, and any indirect jump, as neither of these two cases have following architectural execution.The reason these two are related is that if both options are in use, we want an extra byte of replacement space to be able to inline
lfence; jmp *%reg; int3
.Third Clang has been observed to spot conditional tail calls as
Jcc __x86_indirect_thunk_*
. This is a 6 byte source size, but needs up to 9 bytes of space for inlining including anint3
for straight line speculation reasons (See https://lore.kernel.org/lkml/20211026120310.359986601@infradead.org/ for full details). It might be enough to simply prohibit an optimisation like this when trying to pad retpolines for inlineability.The text was updated successfully, but these errors were encountered: