Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New code-gen options for retpolines and straight line speculation #51665

Open
andyhhp mannequin opened this issue Oct 26, 2021 · 6 comments
Open

New code-gen options for retpolines and straight line speculation #51665

andyhhp mannequin opened this issue Oct 26, 2021 · 6 comments
Labels
bugzilla Issues migrated from bugzilla c clang:to-be-triaged Should not be used for new issues

Comments

@andyhhp
Copy link
Mannequin

andyhhp mannequin commented Oct 26, 2021

Bugzilla Link 52323
Version unspecified
OS Linux
Blocks #4440
CC @andyhhp,@chandlerc,@DougGregor,@efriedma-quic,@jyknight,@m-gupta,@nickdesaulniers,@pageexec,@phoebewang,@zygoloid,@rnk

Extended Description

Hello

[FYI, this is being cross-requested of GCC too]

Linux and other kernel level software makes use of -mindirect-branch=thunk-extern to be able to alter the handling of indirect branches at boot. It turns out to be advantageous to inline the thunks when retpoline is not in use. https://lore.kernel.org/lkml/20211026120132.613201817@infradead.org/ is some infrastructure to make this work.

In some cases, we want to be able to inline an lfence; jmp *%reg thunk. This is fine for the low 8 registers, but not fine for %r{8..15} where the REX prefix pushes the replacement size to being 6 bytes.

It would be very useful to have a code-gen option to write out call %cs:__x86_indirect_thunk_r{8..15} where the redundant %cs prefix will increase the instruction length to 6, allowing the non-retpoline form to be inlined.

Relatedly, x86 straight line speculation has been discussed before, but without any action taken. It would be helpful to have a code gen option which would emit int3 following any ret instruction, and any indirect jump, as neither of these two cases have following architectural execution.

The reason these two are related is that if both options are in use, we want an extra byte of replacement space to be able to inline lfence; jmp *%reg; int3.

Third Clang has been observed to spot conditional tail calls as Jcc __x86_indirect_thunk_*. This is a 6 byte source size, but needs up to 9 bytes of space for inlining including an int3 for straight line speculation reasons (See https://lore.kernel.org/lkml/20211026120310.359986601@infradead.org/ for full details). It might be enough to simply prohibit an optimisation like this when trying to pad retpolines for inlineability.

@andyhhp
Copy link
Mannequin Author

andyhhp mannequin commented Oct 26, 2021

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 for GCC cross-request.

@nickdesaulniers
Copy link
Member

It looks like GCC has added support for -mindirect-branch-cs-prefix:

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=2196a681d7810ad8b227bf983f38ba716620545e

This is being used when available in the Linux kernel:

https://lore.kernel.org/lkml/20211118185421.GK174703@worktop.programming.kicks-ass.net/

@efriedma-quic
Copy link
Collaborator

Relatedly, x86 straight line speculation has been discussed before, but
without any action taken. It would be helpful to have a code gen option
which would emit int3 following any ret instruction, and any indirect
jump, as neither of these two cases have following architectural execution.

Is there documentation somewhere describing this mitigation? In particular:

  1. What unconditional branches can lead straight-line speculation?
  2. What instructions can be used to stop speculation? (Is int3 actually effective? Are there other instructions that would also work?)

@andyhhp
Copy link
Mannequin Author

andyhhp mannequin commented Nov 20, 2021

Relatedly, x86 straight line speculation has been discussed before, but
without any action taken. It would be helpful to have a code gen option
which would emit int3 following any ret instruction, and any indirect
jump, as neither of these two cases have following architectural execution.

Is there documentation somewhere describing this mitigation? In particular:

  1. What unconditional branches can lead straight-line speculation?

For AMD, it is discussed here https://developer.amd.com/wp-content/resources/Managing-Speculation-on-AMD-Processors.pdf, mitigation G-5 on the final page:

Place an LFENCE after an indirect branch instruction (RET, JMP reg or mem,
CALL reg or mem) to help prevent possible sequential speculation.

For Intel, notes are included in SDM Vol2 for the CALL and JMP instructions:

Certain situations may lead to the next sequential instruction after a
near indirect CALL being speculatively executed. If software needs to
prevent this (e.g., in order to prevent a speculative execution side
channel), then an LFENCE instruction opcode can be placed after the near
indirect CALL in order to block speculative execution.

  1. What instructions can be used to stop speculation? (Is int3 actually
    effective? Are there other instructions that would also work?)

As you can see, LFENCE is the official recommendation. It is about the only option for halting speculation which is safe to actually execute, and don't otherwise impact program state.

CALL has architectural execution following it. However, the code following a CALL instruction is typically preservation of the return value and a pile of dead registers wanting reloading, and is typically not a pointer deference involving a callee-clobbered register. Therefore, CALL's are unlikely to have subsequent instructions which are vulnerable to speculative type confusion, and are therefore uninteresting to protect.

JMP and RET are different. They are followed by arbitrary unrelated basic blocks, which could contain anything.

We could use LFENCE everywhere. However, as we don't architecturally execute the instruction, we don't care about architectural side effects. Basically any instruction which causes a decode exception, or is microcoded, halts speculation. INT3 is safe to use, and is 1/3 of the length of LFENCE, so has less of an impact on code size.

@efriedma-quic
Copy link
Collaborator

CALL has architectural execution following it. However, the code following
a CALL instruction is typically preservation of the return value and a pile
of dead registers wanting reloading, and is typically not a pointer
deference involving a callee-clobbered register.

I'm a bit skeptical of heuristics like this; it's making very specific assumptions about how the compiler generates code, which might not hold for different codebases and/or optimizations.

We could use LFENCE everywhere. However, as we don't architecturally
execute the instruction, we don't care about architectural side effects.
Basically any instruction which causes a decode exception, or is microcoded,
halts speculation. INT3 is safe to use, and is 1/3 of the length of LFENCE,
so has less of an impact on code size.

It looks like the current version of Intel manual actually explicitly mentions INT3, so I guess that's fine.

@andyhhp
Copy link
Mannequin Author

andyhhp mannequin commented Nov 22, 2021

It looks like the current version of Intel manual actually explicitly
mentions INT3, so I guess that's fine.
Ah great - I'd missed that update coming though. I'll pester the other guys to document too.

CALL has architectural execution following it. However, the code following
a CALL instruction is typically preservation of the return value and a pile
of dead registers wanting reloading, and is typically not a pointer
deference involving a callee-clobbered register.

I'm a bit skeptical of heuristics like this; it's making very specific
assumptions about how the compiler generates code, which might not hold for
different codebases and/or optimizations.
Nevertheless, protecting JMP/RET with an INT3 is easy and cheap, while protecting CALL with LFENCE is very much not, and risk profiles of the code is very different.

My gut feeling is that anyone wanting protection in the CALL case would probably be using Speculative Load Hardening instead.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
@Endilll Endilll added the clang:to-be-triaged Should not be used for new issues label Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla c clang:to-be-triaged Should not be used for new issues
Projects
None yet
Development

No branches or pull requests

3 participants