From e042c9be754af59fb5903293cb25d0b4cf154c8e Mon Sep 17 00:00:00 2001
From: Jan Kotas <jkotas@microsoft.com>
Date: Mon, 29 Mar 2021 18:17:53 -0700
Subject: [PATCH] .NET runtime security mitigations (#189)

Co-authored-by: Dan Moseley <danmose@microsoft.com>
Co-authored-by: Yaakov <yaakov-h@users.noreply.github.com>
Co-authored-by: Stephen Toub <stoub@microsoft.com>
---
 accepted/2021/runtime-security-mitigations.md | 213 ++++++++++++++++++
 1 file changed, 213 insertions(+)
 create mode 100644 accepted/2021/runtime-security-mitigations.md

diff --git a/accepted/2021/runtime-security-mitigations.md b/accepted/2021/runtime-security-mitigations.md
new file mode 100644
index 000000000..0531bcf05
--- /dev/null
+++ b/accepted/2021/runtime-security-mitigations.md
@@ -0,0 +1,213 @@
+# .NET Runtime Security Mitigations
+
+[Jan Kotas](https://github.com/jkotas)
+
+_From Wikipedia: Mitigation is the reduction of something harmful or the reduction of its harmful effects._
+
+Security is our top concern. The .NET platform combines many layers of security measures that include design and code security reviews, threat modeling, fuzzing of sensitive components, analyzers to detect potential security issues, documentation of security best practices, shipping of regular security updates throughout the support life cycle, bug bounty programs and mandatory security trainings for engineers. This redundancy of methods is known as [defense-in-depth](https://en.wikipedia.org/wiki/Defense_in_depth_(computing)).
+
+This document is focused on mitigations for memory security vulnerabilities. These mitigations are one of the redundant layers for defense-in-depth. They do not make the system secure on their own nor the lack of them make the system insecure. They make systems more secure by preventing attackers from turning vulnerabilities into exploits before the vulnerabilities are patched.
+
+.NET has long been considered more secure than C++ because its type safety, bounds checking, and garbage collection prevent common programming errors that lead to memory corruption. However, since .NET was first designed, all attacks, defenses, and .NET itself have evolved significantly. This document enumerates state-of-the-art security mitigation techniques adopted by the industry, the level of their implementation in .NET runtime and suggests improvements.
+
+Our goal is to make .NET recognized as one of the most secure modern language runtimes available. This document paints part of a picture on how to achieve that.
+
+**Table of Contents**
+
+- [Anatomy of an exploit](#anatomy-of-an-exploit)
+- [Executable Space Protection](#executable-space-protection)
+  - [W^X](#wx)
+  - [Protect Intermediate Language (IL) Code](#protect-intermediate-language-il-code)
+  - [Address Space Layout Randomization (ASLR)](#address-space-layout-randomization-aslr)
+  - [Disallow Runtime Code Generation](#disallow-runtime-code-generation)
+  - [Out-of-process Jit-In-Time Compiler (JIT)](#out-of-process-jit-in-time-compiler-jit)
+- [Control-flow Integrity](#control-flow-integrity)
+  - [Intel Control-flow Enforcement Technology (CET)](#intel-control-flow-enforcement-technology-cet)
+    - [Shadow Stack](#shadow-stack)
+    - [Indirect Branch Tracking (IBT)](#indirect-branch-tracking-ibt)
+  - [Windows Control Flow Guard (CFG)](#windows-control-flow-guard-cfg)
+  - [ARM Pointer Authentication (PA)](#arm-pointer-authentication-pa)
+  - [Stack Buffer Overflow Protection](#stack-buffer-overflow-protection)
+- [Language, Libraries and Runtime Features](#language-libraries-and-runtime-features)
+  - [Unsafe API and C# language constructs](#unsafe-api-and-c-language-constructs)
+  - [Object and Array Pooling](#object-and-array-pooling)
+  - [Secrets in Presence of GC Heap Compaction](#secrets-in-presence-of-gc-heap-compaction)
+  - [Dangerous Deserialization](#dangerous-deserialization)
+- [Looking forward](#looking-forward)
+  - [Plan for .NET 6](#plan-for-net-6)
+- [Appendix](#appendix)
+  - [Timing Side Channel Attacks](#timing-side-channel-attacks)
+  - [Confidential Computing](#confidential-computing)
+
+# Anatomy of an exploit
+
+Memory security vulnerability starts with a memory safety violation. That includes bugs such as writing outside of an array's bounds or using a pointer after freeing the associated object. When an attacker can influence the pointer or data involved, they have at least some control of the program's memory (this is referred to as a write-what-where primitive). Once an attacker has a write-what-where primitive, they may be able to use that to change the program's execution to do something malicious.
+
+The security mitigations prevent or reduce probability of each step in this sequence.
+
+The mitigations for memory safety violations must be typically applied to all code running in a process to be fully effective. The exploits often combine memory safety violations and code from multiple modules in the process. For example, memory safety violation in C/C++ module and code generated by .NET runtime JIT can be combined together to produce an exploit. Thus, missing mitigations in .NET modules can allow exploiting vulnerabilities in C/C++ modules that would be hard or impossible to exploit in isolation, and vice versa.
+
+# Executable Space Protection
+
+The simplest path for an attack is to write new malicious code in memory and then get control flow to jump to it. Executable space protection enforces that memory has independent write and execution permissions. Attacks can no longer create new code because executable memory is not writable and writable memory is not executable.
+
+Executable space protection on Windows is called Data Execution Prevention (DEP).
+
+## W^X
+
+[W^X](https://en.wikipedia.org/wiki/W%5EX) is one of the most fundamental mitigations. It blocks the simplest attack path by disallowing memory pages to be writeable and executable at the same time. .NET runtime has been missing this mitigation so far and the lack of this mitigation has (correctly) resulted in us not considering more advanced mitigations. Large number of pages that are both writeable and executable in a typical .NET process is a ripe target for attacks that simply inject new code.
+
+Apple has made the W^X mandatory for future versions of macOS desktop operating system as part of Apple Silicon transition. It motivated us to schedule implementation of this mitigation for .NET 6, on all supported operating systems. Our principle is to treat all supported operating systems equally with respect to security, where possible.
+
+## Protect Intermediate Language (IL) Code
+
+IL code is equivalent of executable code. The IL code might be overwritten with malicious IL. An attacker could then wait for the IL to be compiled or [re-compiled via tiering](https://github.com/dotnet/runtime/blob/main/docs/design/features/tiered-compilation.md) and have malicious code injected that way.
+
+.NET runtime should treat the IL code same way as directly executable code, ie make IL read-only or otherwise protected wherever possible.
+
+## Address Space Layout Randomization (ASLR)
+
+[Address space layout randomization](https://en.wikipedia.org/wiki/Address_space_layout_randomization) arranges all code and data structures in memory randomly that makes it harder for the attacker to reliably transfer control to particular exploited function or modify particular data structure. .NET runtime has been compatible with ASLR since .NET Framework 3.5 SP1 (released in 2008).
+
+We should be mindful of this mitigation and avoid designs (performance optimizations in particular) that invalidate it.
+
+## Disallow Runtime Code Generation
+
+Disallowing runtime code generation altogether is the ultimate form of executable space protection. It is often combined with verification of signatures of all code being executed.
+
+The runtime code generation has been unconditionally disallowed on most game consoles and Apple devices, to protect business models and consumer experiences. We expect that the set of environments that unconditionally disallow runtime code generation is going to grow over time. It will include cloud infrastructure and the most critical cloud services where the restriction will be enforced across the system via Hypervisor-Protected Code Integrity (HCVI) and related technologies.
+
+Also, operating systems often offer opt-in or opt-out mechanisms to disallow runtime code generation, for example [Arbitrary Code Guard](https://docs.microsoft.com/en-us/windows/security/threat-protection/microsoft-defender-atp/customize-exploit-protection) (ACG) on Windows, [Allow Execution of JIT-compiled Code Entitlement](https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_security_cs_allow-jit) on macOS.
+
+Mono and .NET Native for UWP have been shipping .NET runtimes that abide by the runtime code generation restriction. [Native AOT Form Factor](https://github.com/dotnet/runtimelab/tree/feature/NativeAOT) experiment explores this space further.
+
+IL code interpreters circumvent the environment restriction on runtime code generation and allow .NET libraries that depend on runtime code generation to work, with lower performance. From a security point of view, IL code interpreters are not desirable in environments with disallowed code generation since IL code is equivalent of executable code as described above. This concern applies more generally to any higher level general purpose interpreters and compilers.
+
+## Out-of-process Jit-In-Time Compiler (JIT)
+
+Moving code generation into separate process has been implemented as a security mitigation in some cases, most recently in Chakra JavaScript engine. With this mitigation, the regular process does not have privilege to generate new code and it must delegate the code generation to dedicated process instead. The dedicated process with special privileges to introduce dynamically generated code is audited and heavily scrutinized. The advantage of this solution is that no code in the original process ever needs to write executable memory, so entire attack vectors (like compromised code altering memory mappings) can be shut down entirely. [Google Project Zero analyzed the Chakra implementation](https://github.com/googleprojectzero/p0tools/blob/master/JITServer/JIT-Server-whitepaper.pdf) and pointed out some of its weaknesses.
+
+We do not see strong need for introducing out of process JIT for .NET runtime in foreseeable future. The primary reason is that .NET does not support sand-boxed execution of untrusted code anymore and there are no plans to ever reintroduce it due to its complexity, cost and impact on the evolution of the platform. The out-of-proc JIT has advantages for execution of untrusted (verifiable) code. The security benefits of out-of-proc JIT for trusted code with full access to platform capabilities (security critical APIs) are less interesting since strong soundness validation is not feasible in this scenario. The secondary reasons include availability of more secure options that disallow runtime code generation altogether, the implementation cost and necessity of a second process to run alongside the primary process.
+
+# Control-flow Integrity
+
+Once attackers are no longer able to inject arbitrary code into the process thanks to executable space protections, they start looking for creative ways to leverage existing code in the process to mount the attack. [Return-oriented Programming](https://en.wikipedia.org/wiki/Return-oriented_programming) (ROP) and [Jump-oriented Programming](https://www.csc2.ncsu.edu/faculty/xjiang4/pubs/ASIACCS11.pdf) (JOP) are generalized exploit techniques that take advantage of situations where the program uses indirection for control transfer. The affected indirection is return address for ROP and function pointer for JOP. Control-flow integrity adds validation of these indirections to eliminate or reduce chance of them being used to mount successful attacks.
+
+The control-flow integrity validations are an emerging area. It isn't currently feasible to achieve mitigations with the matching security characteristics across operating systems and hardware vendors as the capabilities are not standardized enough.
+
+## Intel Control-flow Enforcement Technology (CET)
+
+[CET](https://software.intel.com/content/www/us/en/develop/articles/technical-look-control-flow-enforcement-technology.html) is set of hardware features that enable efficient implementation of integrity checks for control flow indirections. CET first appeared in Tiger Lake processors launched in September 2020.
+
+### Shadow Stack
+
+A second stack of return addresses that is mostly invisible to the program is maintained on the side in addition to the ordinary stack. Return instruction uses the second stack to confirm that the return address on the ordinary stack has not been tampered with. This prevents attacks from leveraging stack buffer overruns to overwrite return addresses and hijack the control flow.
+
+Low-level techniques used by .NET runtime such as return address hijacking for GC thread suspension have to be updated for compatibility with shadow stack.
+
+We have been working closely with Windows team on ensuring that the shadow stack support in the operating system can work well together with .NET runtime. New Windows OS APIs designed as part of this effort are going to be introduced to make it possible. These new APIs should help other similar language runtimes to enable this mitigation as well. Our plan is complete support for shadow stack mitigation on Windows in .NET 6.
+
+Linux support for shadow stack is [still being worked on](https://lore.kernel.org/linux-mm/20210217222730.15819-1-yu-cheng.yu@intel.com/T/#t) at the time of writing. Once the shadow stack support lands in mainstream Linux kernels, we will make sure that .NET runtime is compatible with it. We expect that supporting shadow stack on Linux will be easier due to the less restrictive approach taken to implement this mitigation in Linux kernel. Unlike Windows, the shadow stack mismatches can be handled by the user code on Linux that is less secure, but also easier to work with.
+
+For reference, Clang includes [software implementation of shadow stack for Arm64](https://clang.llvm.org/docs/ShadowCallStack.html) that is used on Android.
+
+### Indirect Branch Tracking (IBT)
+
+All indirect branch targets must start with a new ENDBRANCH instruction. The processor ensures that indirect call and jump instructions are only able to transfer control to ENDBRANCH instruction. The ENDBRANCH instruction is no-op on older processors. It allows same binaries to be used on older hardware that does not support this mitigation.
+
+IBT reduces but does not eliminate a chance of a tampered function pointer to remain undetected. Replacing the function pointer with a different valid target is still possible and can be used to defeat the mitigation.
+
+We will work together with RedHat and other stakeholders to make .NET runtime compatible with IBT on Linux. [dotnet/runtime#40100](https://github.com/dotnet/runtime/issues/40100) has early conversation about this effort.
+
+There are no plans to take advantage of IBT in Windows since it is equivalent to software based CFG mitigation described in the next chapter that has been shipping in Windows for some time. There is no point in .NET runtime supporting IBT when the underlying OS does not support it. Our principle is to adopt the native security mitigation of the OS that .NET runtime is running on, where possible.
+
+Armv8.5-A includes equivalent mitigation as [Branch Target Identification](https://developer.arm.com/docs/ddi0602/f/base-instructions-alphabetic-order/bti-branch-target-identification) (BTI) instruction.
+
+## Windows Control Flow Guard (CFG)
+
+CFG is a software equivalent of IBT. The operating system maintains bitmap of all valid indirect branch targets. Code inserted before indirect call and jump instructions consults this bitmap to validate the target address.
+
+The .NET runtime C/C++ implementation is compiled [with /guard:cf](https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-control-flow-guard). However, the .NET runtime generated code (JITed code or hand-generated assembly stubs) does not cooperate with CFG today. The indirect calls made by .NET runtime generated code do not include CFG checks and all locations in code generated at runtime are marked as valid CFG targets. It reduces protection that the other code loaded in the process (e.g. OS libraries written in C/C++) gets from CFG.
+
+As first priority, the .NET runtime should start marking the valid indirect call targets properly using `SetProcessValidCallTargets` API. It will cease to reduce protection that the other code loaded in the process gets from CFG.
+
+Introducing CFG checks for all indirect calls made by .NET runtime warrants further investigation and feasibility analysis. The CFG checks add measurable overhead to indirect calls. The overhead of straightforward implementation may prove to be prohibitive since indirect calls are much more frequent in .NET code. A viable CFG checks alternative is storing function pointers in read-only memory where they cannot be tampered with. We may need to update the key runtime control structures to fit this model to reduce frequency and overhead of CFG checks.
+
+For reference, Clang includes [multiple software control flow integrity options geared towards C++](http://clang.llvm.org/docs/ControlFlowIntegrity.html).
+
+## ARM Pointer Authentication (PA)
+
+A significant number of address bits on 64-bit systems is unused. [Pointer authentication](https://www.qualcomm.com/media/documents/files/whitepaper-pointer-authentication-on-armv8-3.pdf) uses the unused address bits to store pointer authentication code (PAC), computed as a hash of pointer value, secret key and pointer type specific context. The PAC is validated before each pointer use. The scheme is supported by new instructions to compute and validate PAC.
+
+PA can be used either on code pointers for control flow integrity checks or on data pointers for general data sanity checks.
+
+PA in compilation with BTI provides stronger protection than IBT for indirect calls and branches. However, PA is weaker than shadow stack for protecting return addresses since it does not prevent pointer replacement exploits.
+
+Apple published [preview implementation](https://developer.apple.com/documentation/security/preparing_your_app_to_work_with_pointer_authentication) of ARM64E ABI hardened using pointer authentication. We expect that other major operating systems will follow with similar technology for ARM platforms. Updating .NET runtimes to be compatible and take advantage of pointer authentication is major undertaking that affects number of sub-systems, including JIT, GC, interop or debugger. Our preliminary plan is to start tackling it for .NET 7.
+
+## Stack Buffer Overflow Protection
+
+[Stack Buffer Overflow](https://en.wikipedia.org/wiki/Stack_buffer_overflow) protection is among the older security mitigations. It has been known as &quot;GS cookie&quot; on Windows. It is designed to catch writes into stack memory outside the intended bounds.
+
+The first .NET runtime that included stack buffer overflow protection for managed code was .NET Framework 2.0, and the implementation has not changed much since then. The mitigation in C/C++ compilers have seen improvements in the meantime (for example, [Enhanced GS](https://msrc-blog.microsoft.com/2009/03/20/enhanced-gs-in-visual-studio-2010/) or -[fstack-protector-strong](https://lwn.net/Articles/584225/)). New .NET patterns such as unsafe arithmetic on ref parameters introduced new risks that are not covered currently.
+
+We will review and update stack buffer overflow protection in .NET runtime to match state of the art.
+
+# Language, Libraries and Runtime Features
+
+Library and language design have always played a key role in mitigating security bugs by preventing or discouraging them from being introduced in the first place.
+
+## Unsafe API and C# language constructs
+
+Number of newer APIs such as `Memory<T>` and friends come with security risks that are not clearly communicated.
+
+We need to adopt a more holistic approach to unsafe features and APIs that we expose for public use. Developers need feedback so that they understand that [they are using constructs that are unsafe](https://github.com/dotnet/runtime/issues/31354) and are [informed of the specific risks that come with using those APIs](https://github.com/dotnet/runtime/issues/41418). Over time, we bias more to constructs and paradigms that are safe by default (possibly inspired by Rust).
+
+We have been often uncomfortable publicly exposing potentially dangerous APIs we use internally, e.g. [`ValueStringBuilder`](https://github.com/dotnet/runtime/issues/25587). We will continue to examine the balance of security against utility. The security risks of similar APIs can be mitigated by introducing new C# language features, by detecting [unsafe usage patterns via analyzers](https://github.com/dotnet/runtime/issues?q=label%3Acode-analyzer+label%3Asecurity) or by concealing the APIs in inherently dangerous namespaces such as `System.Runtime.CompilerServices` and `System.Runtime.InteropServices`.
+
+## Object and Array Pooling
+
+`ArrayPool<T>` and other similar custom object pools are equivalent of the classic C malloc. Manual lifetime management traditionally suffers from use-after-free and double-free security vulnerabilities. Many years were spent on implementing diagnostics and mitigations of these vulnerabilities for malloc.
+
+We need to implement [similar diagnostic and mitigation capabilities for `ArrayPool<T>`](https://github.com/dotnet/runtime/issues/7532) and friends. It is [not unusual](https://github.com/dotnet/corefx/pull/37270) to find security bugs caused by incorrect `ArrayPool<T>` use even in dotnet/runtime libraries.
+
+## Secrets in Presence of GC Heap Compaction
+
+GC heap compaction can create a second copy of secrets stored in managed memory. It makes it difficult for an application to reliably clear in-memory secrets stored in managed memory after they are no longer used.
+
+We will introduce a GC feature that clears unused memory after GC heap compaction. Having the GC clear memory after compaction improves the security of applications for which heap dumps and in-memory probes are within their scope of threat. [dotnet/runtime#10480](https://github.com/dotnet/runtime/issues/10480) has more details.
+
+## Dangerous Deserialization
+
+Object deserialization vulnerabilities continue to be a major issue for many type safe languages. These vulnerabilities allow the attacker to introduce unexpected object graphs into the remote process.
+
+We plan to gradually [deprecate binary formatter](https://github.com/dotnet/designs/blob/main/accepted/2020/better-obsoletion/binaryformatter-obsoletion.md) that has been the source of the worst .NET serialization vulnerabilities. We will work on raising awareness about other similar dangerous serialization patterns within .NET ecosystem.
+
+# Looking forward
+
+This roadmap involves quite a bit of work that will be spread over multiple releases. Going forward, .NET should adopt mitigations as they become available. We will look for opportunities to engage security experts and academics to help us identify new and possibly unique mitigations.
+
+## Plan for .NET 6
+
+- W^X: [https://github.com/dotnet/runtime/issues/50391](https://github.com/dotnet/runtime/issues/50391)
+- CET (Windows): [https://github.com/dotnet/runtime/issues/40100](https://github.com/dotnet/runtime/issues/40100)
+- Dangerous Deserialization: [https://github.com/dotnet/designs/blob/main/accepted/2020/better-obsoletion/binaryformatter-obsoletion.md](https://github.com/dotnet/designs/blob/main/accepted/2020/better-obsoletion/binaryformatter-obsoletion.md)
+- Disallow Runtime Code Generation
+  - Xamarin.iOS: [https://github.com/dotnet/xamarin/issues/2](https://github.com/dotnet/xamarin/issues/2)
+  - Native AOT Experiment: [https://github.com/dotnet/runtimelab/issues/248](https://github.com/dotnet/runtimelab/issues/248)
+
+# Appendix
+
+The topics discussed in this appendix are not the focus of this document. They are frequently asked questions, and their brief discussion is included here for reference only.
+
+## Timing Side Channel Attacks
+
+The recent Spectre disclosures spawned significant debate on the level of mitigation that compilers should support against side channels. There continues to be active discussion on whether compilers should enable these mitigations by default, including calls to make the runtime resilient against these by default. Now that this space is better understood, we do not think that there is a very appealing and tractable path for robust same address space side channel protection. We have no plans to implement mitigations against side channel attacks in .NET runtime.
+
+## Confidential Computing
+
+[Confidential computing](https://confidentialcomputing.io/) is hardware encryption technology designed to protect [data in-use](https://en.wikipedia.org/wiki/Data_in_use) from unauthorized users, up to and including the system administrators and operators themselves. It comes in two main forms: full memory encryption and enclaves.
+
+Full memory encryption protects all memory transparently. AMD Secure Memory Encryption (SME) and Intel Total Memory Encryption (TME) are existing or planned implementations of this technology. We can expect it to become the baseline of all premium cloud offerings within the next decade. While full memory encryption does not require any fundamental changes of the software stack at .NET runtime level, it may require updates of designs and best practices for diagnostics and telemetry to avoid unintentional egress of confidential data.
+
+Enclaves protect a portion of memory. It is implemented by Intel Software Guard Extensions (SGX). To fit into Enclaves programming model, the software stack typically requires refactoring of sensitive computation into isolated component that runs inside the Enclave and [uses specialized APIs for communication](https://openenclave.io/sdk/). The current enclave capabilities are too limited to fully suport a general purpose .NET runtime. .NET runtime for enclaves would be a niche [form factor](https://github.com/dotnet/designs/blob/main/accepted/2020/form-factors.md) with number of limitations.