-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x86 LVI hardening #58
x86 LVI hardening #58
Conversation
This is fixing a known, public, security vulnerability so I'd appreciate a quick review. |
@cuviper Does RedHat backport this? I think this needs to bake upstream some more, this has landed upstream only a couple of days ago (after sitting in review for a few months). I'm also still waiting for some compile-time regression mitigations for this change (llvm@de92dc2 is a partial mitigation). |
Can you elaborate on the timeline you have in mind? The changes just landed upstream but the review process was very thorough. The patches were posted publicly in March but they have been developed and tested privately months prior. We've been using some version of these patches on a fork of Rust for a couple of months without issues. Since these passes are opt-in, most users will not be affected by landing this. On the other hand, users who want to enable these passes need these patches yesterday to patch known, public, security vulnerabilities. The patch you linked affects a file that doesn't exist in LLVM 9. |
No. Our official statement on CVE-2020-0551 is that "no additional changes are deemed necessary or practical to address this flaw at the software layer." But that's specific to our products, and I do see how a project like Fortanix would need extra care. |
Please note that we've moved to LLVM 10 since rust-lang/rust#67759, so we'll need this PR rebased onto |
Nightly has recently updated to LLVM 10, so this would need to go into the https://github.com/rust-lang/llvm-project/tree/rustc/10.0-2020-05-05 branch now.
That's about what I expected. Can you check with Tom Stellard whether there's any plans to backport this upstream for the LLVM 10.0.1 release? (Independently of RHs own policy on this issue.) Apart from that, I guess it would be fine to merge this. This has been in LLVM master for a while now without any issues I'm aware of, and most people will never run this code. The only impact for most people would be a slight compile-time regression for debug builds. |
RDF is designed to be target agnostic. Therefore it would be useful to have it available for other targets, such as X86. Based on a previous patch by Krzysztof Parzyszek Differential Revision: https://reviews.llvm.org/D75932
…de to "Indirect Thunks" There are applications for indirect call/branch thunks other than retpoline for Spectre v2, e.g., https://software.intel.com/security-software-guidance/software-guidance/load-value-injection Therefore it makes sense to refactor X86RetpolineThunks as a more general capability. Differential Revision: https://reviews.llvm.org/D76810
… than Retpoline Introduce a ThunkInserter CRTP base class from which new thunk types can inherit, e.g., thunks to mitigate https://software.intel.com/security-software-guidance/software-guidance/load-value-injection. Differential Revision: https://reviews.llvm.org/D76811
…ion (LVI) This pass replaces each indirect call/jump with a direct call to a thunk that looks like: lfence jmpq *%r11 This ensures that if the value in register %r11 was loaded from memory, then the value in %r11 is (architecturally) correct prior to the jump. Also adds a new target feature to X86: +lvi-cfi ("cfi" meaning control-flow integrity) The feature can be added via clang CLI using -mlvi-cfi. This is an alternate implementation to https://reviews.llvm.org/D75934 That merges the thunk insertion functionality with the existing X86 retpoline code. Differential Revision: https://reviews.llvm.org/D76812
Adding a pass that replaces every ret instruction with the sequence: pop <scratch-reg> lfence jmp *<scratch-reg> where <scratch-reg> is some available scratch register, according to the calling convention of the function being mitigated. Differential Revision: https://reviews.llvm.org/D75935
… (LVI) Gadgets Adds a new data structure, ImmutableGraph, and uses RDF to find LVI gadgets and add them to a MachineGadgetGraph. More specifically, a new X86 machine pass finds Load Value Injection (LVI) gadgets consisting of a load from memory (i.e., SOURCE), and any operation that may transmit the value loaded from memory over a covert channel, or use the value loaded from memory to determine a branch/call target (i.e., SINK). Also adds a new target feature to X86: +lvi-load-hardening The feature can be added via the clang CLI using -mlvi-hardening. Differential Revision: https://reviews.llvm.org/D75936
… (LVI) After finding all such gadgets in a given function, the pass minimally inserts LFENCE instructions in such a manner that the following property is satisfied: for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at least one LFENCE instruction. The algorithm that implements this minimal insertion is influenced by an academic paper that minimally inserts memory fences for high-performance concurrent programs: http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf The algorithm implemented in this pass is as follows: 1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components: -SOURCE instructions (also includes function arguments) -SINK instructions -Basic block entry points -Basic block terminators -LFENCE instructions 2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6. 3. Use a heuristic or plugin to approximate minimal LFENCE insertion. 4. Insert one LFENCE along each CFG edge that was cut in step 3. 5. Go to step 2. 6. If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified. By default, the heuristic used in Step 3 is a greedy heuristic that avoids inserting LFENCEs into loops unless absolutely necessary. There is also a CLI option to load a plugin that can provide even better optimization, inserting fewer fences, while still mitigating all of the LVI gadgets. The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin, and a description of the pass's behavior with the plugin can be found here: https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection. Differential Revision: https://reviews.llvm.org/D75937
…jection (LVI) Added code to X86AsmParser::emitInstruction() to add an LFENCE after each instruction that may load, and emit a warning if it encounters an instruction that may be vulnerable, but cannot be automatically mitigated. Differential Revision: https://reviews.llvm.org/D76158
Updated. Nice bonus is that LLVM 10 is on C++14 so the backport was easier. |
@@ -138,9 +138,11 @@ | |||
; CHECK-NEXT: Machine Loop Invariant Code Motion | |||
; CHECK-NEXT: Bundle Machine CFG Edges | |||
; CHECK-NEXT: X86 FP Stackifier | |||
; CHECK-NEXT: MachineDominator Tree Construction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved from a couple of lines down, so I don't think something like de92dc2 is needed.
So can I get a review then? |
@nikic If you file a bug requesting this backport at bugs.llvm.org and mark it as a blocker of release-10.0.1, I will add it to my review queue. |
@nikic I should add this doesn't guarantee I'll merge it. The patch needs to be reviewed to make sure it meets the release criteria and get the necessary approvals. But step 1 for this process is filing the bug. |
…callback The `TypeSystemMap::m_mutex` guards against concurrent modifications of members of `TypeSystemMap`. In particular, `m_map`. `TypeSystemMap::ForEach` iterates through the entire `m_map` calling a user-specified callback for each entry. This is all done while `m_mutex` is locked. However, there's nothing that guarantees that the callback itself won't call back into `TypeSystemMap` APIs on the same thread. This lead to double-locking `m_mutex`, which is undefined behaviour. We've seen this cause a deadlock in the swift plugin with following backtrace: ``` int main() { std::unique_ptr<int> up = std::make_unique<int>(5); volatile int val = *up; return val; } clang++ -std=c++2a -g -O1 main.cpp ./bin/lldb -o “br se -p return” -o run -o “v *up” -o “expr *up” -b ``` ``` frame rust-lang#4: std::lock_guard<std::mutex>::lock_guard frame rust-lang#5: lldb_private::TypeSystemMap::GetTypeSystemForLanguage <<<< Lock rust-lang#2 frame rust-lang#6: lldb_private::TypeSystemMap::GetTypeSystemForLanguage frame rust-lang#7: lldb_private::Target::GetScratchTypeSystemForLanguage ... frame rust-lang#26: lldb_private::SwiftASTContext::LoadLibraryUsingPaths frame rust-lang#27: lldb_private::SwiftASTContext::LoadModule frame rust-lang#30: swift::ModuleDecl::collectLinkLibraries frame rust-lang#31: lldb_private::SwiftASTContext::LoadModule frame rust-lang#34: lldb_private::SwiftASTContext::GetCompileUnitImportsImpl frame rust-lang#35: lldb_private::SwiftASTContext::PerformCompileUnitImports frame rust-lang#36: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetSwiftASTContext frame rust-lang#37: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetPersistentExpressionState frame rust-lang#38: lldb_private::Target::GetPersistentSymbol frame rust-lang#41: lldb_private::TypeSystemMap::ForEach <<<< Lock #1 frame rust-lang#42: lldb_private::Target::GetPersistentSymbol frame rust-lang#43: lldb_private::IRExecutionUnit::FindInUserDefinedSymbols frame rust-lang#44: lldb_private::IRExecutionUnit::FindSymbol frame rust-lang#45: lldb_private::IRExecutionUnit::MemoryManager::GetSymbolAddressAndPresence frame rust-lang#46: lldb_private::IRExecutionUnit::MemoryManager::findSymbol frame rust-lang#47: non-virtual thunk to lldb_private::IRExecutionUnit::MemoryManager::findSymbol frame rust-lang#48: llvm::LinkingSymbolResolver::findSymbol frame rust-lang#49: llvm::LegacyJITSymbolResolver::lookup frame rust-lang#50: llvm::RuntimeDyldImpl::resolveExternalSymbols frame rust-lang#51: llvm::RuntimeDyldImpl::resolveRelocations frame rust-lang#52: llvm::MCJIT::finalizeLoadedModules frame rust-lang#53: llvm::MCJIT::finalizeObject frame rust-lang#54: lldb_private::IRExecutionUnit::ReportAllocations frame rust-lang#55: lldb_private::IRExecutionUnit::GetRunnableInfo frame rust-lang#56: lldb_private::ClangExpressionParser::PrepareForExecution frame rust-lang#57: lldb_private::ClangUserExpression::TryParse frame rust-lang#58: lldb_private::ClangUserExpression::Parse ``` Our solution is to simply iterate over a local copy of `m_map`. **Testing** * Confirmed on manual reproducer (would reproduce 100% of the time before the patch) Differential Revision: https://reviews.llvm.org/D149949
) Currently, process of replacing bitwise operations consisting of `LSR`/`LSL` with `And` is performed by `DAGCombiner`. However, in certain cases, the `AND` generated by this process can be removed. Consider following case: ``` lsr x8, x8, rust-lang#56 and x8, x8, #0xfc ldr w0, [x2, x8] ret ``` In this case, we can remove the `AND` by changing the target of `LDR` to `[X2, X8, LSL rust-lang#2]` and right-shifting amount change to 56 to 58. after changed: ``` lsr x8, x8, rust-lang#58 ldr w0, [x2, x8, lsl rust-lang#2] ret ``` This patch checks to see if the `SHIFTING` + `AND` operation on load target can be optimized and optimizes it if it can.
These patches are needed to mitigate the Load Value Injection (LVI) vulnerability in certain Intel processors.
The patches add several off-by-default passes to the x86 backend. The passes may be enabled as follows:
x86 Target features:
lvi-cfi
,lvi-load-hardening
LLVM argument:
-x86-experimental-lvi-inline-asm-hardening
This is cherry-picking commits 080dd10 71e8021 b1d5810 5b519cf f95a67d e97a3e5 8ce078c 08b8b72 from upstream.
cc @raoulstrackx