Skip to content

Commit

Permalink
Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linu…
Browse files Browse the repository at this point in the history
…x/kernel/git/tip/tip

Pull spectre/meltdown updates from Thomas Gleixner:
 "The next round of updates related to melted spectrum:

   - The initial set of spectre V1 mitigations:

       - Array index speculation blocker and its usage for syscall,
         fdtable and the n180211 driver.

       - Speculation barrier and its usage in user access functions

   - Make indirect calls in KVM speculation safe

   - Blacklisting of known to be broken microcodes so IPBP/IBSR are not
     touched.

   - The initial IBPB support and its usage in context switch

   - The exposure of the new speculation MSRs to KVM guests.

   - A fix for a regression in x86/32 related to the cpu entry area

   - Proper whitelisting for known to be safe CPUs from the mitigations.

   - objtool fixes to deal proper with retpolines and alternatives

   - Exclude __init functions from retpolines which speeds up the boot
     process.

   - Removal of the syscall64 fast path and related cleanups and
     simplifications

   - Removal of the unpatched paravirt mode which is yet another source
     of indirect unproteced calls.

   - A new and undisputed version of the module mismatch warning

   - A couple of cleanup and correctness fixes all over the place

  Yet another step towards full mitigation. There are a few things still
  missing like the RBS underflow mitigation for Skylake and other small
  details, but that's being worked on.

  That said, I'm taking a belated christmas vacation for a week and hope
  that everything is magically solved when I'm back on Feb 12th"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
  KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  KVM/x86: Add IBPB support
  KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX
  x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL
  x86/pti: Mark constant arrays as __initconst
  x86/spectre: Simplify spectre_v2 command line parsing
  x86/retpoline: Avoid retpolines for built-in __init functions
  x86/kvm: Update spectre-v1 mitigation
  KVM: VMX: make MSR bitmaps per-VCPU
  x86/paravirt: Remove 'noreplace-paravirt' cmdline option
  x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel
  x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable"
  x86/spectre: Report get_user mitigation for spectre_v1
  nl80211: Sanitize array index in parse_txq_params
  vfs, fdtable: Prevent bounds-check bypass via speculative execution
  x86/syscall: Sanitize syscall table de-references under speculation
  x86/get_user: Use pointer masking to limit speculation
  ...
  • Loading branch information
torvalds committed Feb 4, 2018
2 parents 0a646e9 + b2ac58f commit 3527799
Show file tree
Hide file tree
Showing 38 changed files with 977 additions and 554 deletions.
2 changes: 0 additions & 2 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2758,8 +2758,6 @@
norandmaps Don't use address space randomization. Equivalent to
echo 0 > /proc/sys/kernel/randomize_va_space

noreplace-paravirt [X86,IA-64,PV_OPS] Don't patch paravirt_ops

noreplace-smp [X86-32,SMP] Don't replace SMP instructions
with UP alternatives

Expand Down
90 changes: 90 additions & 0 deletions Documentation/speculation.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
This document explains potential effects of speculation, and how undesirable
effects can be mitigated portably using common APIs.

===========
Speculation
===========

To improve performance and minimize average latencies, many contemporary CPUs
employ speculative execution techniques such as branch prediction, performing
work which may be discarded at a later stage.

Typically speculative execution cannot be observed from architectural state,
such as the contents of registers. However, in some cases it is possible to
observe its impact on microarchitectural state, such as the presence or
absence of data in caches. Such state may form side-channels which can be
observed to extract secret information.

For example, in the presence of branch prediction, it is possible for bounds
checks to be ignored by code which is speculatively executed. Consider the
following code:

int load_array(int *array, unsigned int index)
{
if (index >= MAX_ARRAY_ELEMS)
return 0;
else
return array[index];
}

Which, on arm64, may be compiled to an assembly sequence such as:

CMP <index>, #MAX_ARRAY_ELEMS
B.LT less
MOV <returnval>, #0
RET
less:
LDR <returnval>, [<array>, <index>]
RET

It is possible that a CPU mis-predicts the conditional branch, and
speculatively loads array[index], even if index >= MAX_ARRAY_ELEMS. This
value will subsequently be discarded, but the speculated load may affect
microarchitectural state which can be subsequently measured.

More complex sequences involving multiple dependent memory accesses may
result in sensitive information being leaked. Consider the following
code, building on the prior example:

int load_dependent_arrays(int *arr1, int *arr2, int index)
{
int val1, val2,

val1 = load_array(arr1, index);
val2 = load_array(arr2, val1);

return val2;
}

Under speculation, the first call to load_array() may return the value
of an out-of-bounds address, while the second call will influence
microarchitectural state dependent on this value. This may provide an
arbitrary read primitive.

====================================
Mitigating speculation side-channels
====================================

The kernel provides a generic API to ensure that bounds checks are
respected even under speculation. Architectures which are affected by
speculation-based side-channels are expected to implement these
primitives.

The array_index_nospec() helper in <linux/nospec.h> can be used to
prevent information from being leaked via side-channels.

A call to array_index_nospec(index, size) returns a sanitized index
value that is bounded to [0, size) even under cpu speculation
conditions.

This can be used to protect the earlier load_array() example:

int load_array(int *array, unsigned int index)
{
if (index >= MAX_ARRAY_ELEMS)
return 0;
else {
index = array_index_nospec(index, MAX_ARRAY_ELEMS);
return array[index];
}
}
9 changes: 6 additions & 3 deletions arch/x86/entry/common.c
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include <linux/export.h>
#include <linux/context_tracking.h>
#include <linux/user-return-notifier.h>
#include <linux/nospec.h>
#include <linux/uprobes.h>
#include <linux/livepatch.h>
#include <linux/syscalls.h>
Expand Down Expand Up @@ -206,7 +207,7 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
* special case only applies after poking regs and before the
* very next return to user mode.
*/
current->thread.status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
#endif

user_enter_irqoff();
Expand Down Expand Up @@ -282,7 +283,8 @@ __visible void do_syscall_64(struct pt_regs *regs)
* regs->orig_ax, which changes the behavior of some syscalls.
*/
if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
regs->ax = sys_call_table[nr & __SYSCALL_MASK](
nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls);
regs->ax = sys_call_table[nr](
regs->di, regs->si, regs->dx,
regs->r10, regs->r8, regs->r9);
}
Expand All @@ -304,7 +306,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
unsigned int nr = (unsigned int)regs->orig_ax;

#ifdef CONFIG_IA32_EMULATION
current->thread.status |= TS_COMPAT;
ti->status |= TS_COMPAT;
#endif

if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) {
Expand All @@ -318,6 +320,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
}

if (likely(nr < IA32_NR_syscalls)) {
nr = array_index_nospec(nr, IA32_NR_syscalls);
/*
* It's possible that a 32-bit syscall implementation
* takes a 64-bit parameter but nonetheless assumes that
Expand Down
127 changes: 7 additions & 120 deletions arch/x86/entry/entry_64.S
Original file line number Diff line number Diff line change
Expand Up @@ -236,91 +236,20 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
pushq %r9 /* pt_regs->r9 */
pushq %r10 /* pt_regs->r10 */
pushq %r11 /* pt_regs->r11 */
sub $(6*8), %rsp /* pt_regs->bp, bx, r12-15 not saved */
UNWIND_HINT_REGS extra=0

TRACE_IRQS_OFF

/*
* If we need to do entry work or if we guess we'll need to do
* exit work, go straight to the slow path.
*/
movq PER_CPU_VAR(current_task), %r11
testl $_TIF_WORK_SYSCALL_ENTRY|_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
jnz entry_SYSCALL64_slow_path

entry_SYSCALL_64_fastpath:
/*
* Easy case: enable interrupts and issue the syscall. If the syscall
* needs pt_regs, we'll call a stub that disables interrupts again
* and jumps to the slow path.
*/
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
#if __SYSCALL_MASK == ~0
cmpq $__NR_syscall_max, %rax
#else
andl $__SYSCALL_MASK, %eax
cmpl $__NR_syscall_max, %eax
#endif
ja 1f /* return -ENOSYS (already in pt_regs->ax) */
movq %r10, %rcx

/*
* This call instruction is handled specially in stub_ptregs_64.
* It might end up jumping to the slow path. If it jumps, RAX
* and all argument registers are clobbered.
*/
#ifdef CONFIG_RETPOLINE
movq sys_call_table(, %rax, 8), %rax
call __x86_indirect_thunk_rax
#else
call *sys_call_table(, %rax, 8)
#endif
.Lentry_SYSCALL_64_after_fastpath_call:

movq %rax, RAX(%rsp)
1:
pushq %rbx /* pt_regs->rbx */
pushq %rbp /* pt_regs->rbp */
pushq %r12 /* pt_regs->r12 */
pushq %r13 /* pt_regs->r13 */
pushq %r14 /* pt_regs->r14 */
pushq %r15 /* pt_regs->r15 */
UNWIND_HINT_REGS

/*
* If we get here, then we know that pt_regs is clean for SYSRET64.
* If we see that no exit work is required (which we are required
* to check with IRQs off), then we can go straight to SYSRET64.
*/
DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_OFF
movq PER_CPU_VAR(current_task), %r11
testl $_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
jnz 1f

LOCKDEP_SYS_EXIT
TRACE_IRQS_ON /* user mode is traced as IRQs on */
movq RIP(%rsp), %rcx
movq EFLAGS(%rsp), %r11
addq $6*8, %rsp /* skip extra regs -- they were preserved */
UNWIND_HINT_EMPTY
jmp .Lpop_c_regs_except_rcx_r11_and_sysret

1:
/*
* The fast path looked good when we started, but something changed
* along the way and we need to switch to the slow path. Calling
* raise(3) will trigger this, for example. IRQs are off.
*/
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_ANY)
SAVE_EXTRA_REGS
movq %rsp, %rdi
call syscall_return_slowpath /* returns with IRQs disabled */
jmp return_from_SYSCALL_64

entry_SYSCALL64_slow_path:
/* IRQs are off. */
SAVE_EXTRA_REGS
movq %rsp, %rdi
call do_syscall_64 /* returns with IRQs disabled */

return_from_SYSCALL_64:
TRACE_IRQS_IRETQ /* we're about to change IF */

/*
Expand Down Expand Up @@ -393,7 +322,6 @@ syscall_return_via_sysret:
/* rcx and r11 are already restored (see code above) */
UNWIND_HINT_EMPTY
POP_EXTRA_REGS
.Lpop_c_regs_except_rcx_r11_and_sysret:
popq %rsi /* skip r11 */
popq %r10
popq %r9
Expand Down Expand Up @@ -424,47 +352,6 @@ syscall_return_via_sysret:
USERGS_SYSRET64
END(entry_SYSCALL_64)

ENTRY(stub_ptregs_64)
/*
* Syscalls marked as needing ptregs land here.
* If we are on the fast path, we need to save the extra regs,
* which we achieve by trying again on the slow path. If we are on
* the slow path, the extra regs are already saved.
*
* RAX stores a pointer to the C function implementing the syscall.
* IRQs are on.
*/
cmpq $.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
jne 1f

/*
* Called from fast path -- disable IRQs again, pop return address
* and jump to slow path
*/
DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_OFF
popq %rax
UNWIND_HINT_REGS extra=0
jmp entry_SYSCALL64_slow_path

1:
JMP_NOSPEC %rax /* Called from C */
END(stub_ptregs_64)

.macro ptregs_stub func
ENTRY(ptregs_\func)
UNWIND_HINT_FUNC
leaq \func(%rip), %rax
jmp stub_ptregs_64
END(ptregs_\func)
.endm

/* Instantiate ptregs_stub for each ptregs-using syscall */
#define __SYSCALL_64_QUAL_(sym)
#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym
#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
#include <asm/syscalls_64.h>

/*
* %rdi: prev task
* %rsi: next task
Expand Down
7 changes: 2 additions & 5 deletions arch/x86/entry/syscall_64.c
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,11 @@
#include <asm/asm-offsets.h>
#include <asm/syscall.h>

#define __SYSCALL_64_QUAL_(sym) sym
#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym

#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
#include <asm/syscalls_64.h>
#undef __SYSCALL_64

#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym),
#define __SYSCALL_64(nr, sym, qual) [nr] = sym,

extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);

Expand Down
28 changes: 28 additions & 0 deletions arch/x86/include/asm/barrier.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,34 @@
#define wmb() asm volatile("sfence" ::: "memory")
#endif

/**
* array_index_mask_nospec() - generate a mask that is ~0UL when the
* bounds check succeeds and 0 otherwise
* @index: array element index
* @size: number of elements in array
*
* Returns:
* 0 - (index < size)
*/
static inline unsigned long array_index_mask_nospec(unsigned long index,
unsigned long size)
{
unsigned long mask;

asm ("cmp %1,%2; sbb %0,%0;"
:"=r" (mask)
:"r"(size),"r" (index)
:"cc");
return mask;
}

/* Override the default implementation from linux/nospec.h. */
#define array_index_mask_nospec array_index_mask_nospec

/* Prevent speculative execution past this barrier. */
#define barrier_nospec() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
"lfence", X86_FEATURE_LFENCE_RDTSC)

#ifdef CONFIG_X86_PPRO_FENCE
#define dma_rmb() rmb()
#else
Expand Down
6 changes: 4 additions & 2 deletions arch/x86/include/asm/fixmap.h
Original file line number Diff line number Diff line change
Expand Up @@ -137,8 +137,10 @@ enum fixed_addresses {

extern void reserve_top_address(unsigned long reserve);

#define FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT)
#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
#define FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT)
#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
#define FIXADDR_TOT_SIZE (__end_of_fixed_addresses << PAGE_SHIFT)
#define FIXADDR_TOT_START (FIXADDR_TOP - FIXADDR_TOT_SIZE)

extern int fixmaps_set;

Expand Down
3 changes: 1 addition & 2 deletions arch/x86/include/asm/msr.h
Original file line number Diff line number Diff line change
Expand Up @@ -214,8 +214,7 @@ static __always_inline unsigned long long rdtsc_ordered(void)
* that some other imaginary CPU is updating continuously with a
* time stamp.
*/
alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
"lfence", X86_FEATURE_LFENCE_RDTSC);
barrier_nospec();
return rdtsc();
}

Expand Down
2 changes: 1 addition & 1 deletion arch/x86/include/asm/nospec-branch.h
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ extern char __indirect_thunk_end[];
* On VMEXIT we must ensure that no RSB predictions learned in the guest
* can be followed in the host, by overwriting the RSB completely. Both
* retpoline and IBRS mitigations for Spectre v2 need this; only on future
* CPUs with IBRS_ATT *might* it be avoided.
* CPUs with IBRS_ALL *might* it be avoided.
*/
static inline void vmexit_fill_RSB(void)
{
Expand Down
5 changes: 3 additions & 2 deletions arch/x86/include/asm/pgtable_32_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,9 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */
*/
#define CPU_ENTRY_AREA_PAGES (NR_CPUS * 40)

#define CPU_ENTRY_AREA_BASE \
((FIXADDR_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) & PMD_MASK)
#define CPU_ENTRY_AREA_BASE \
((FIXADDR_TOT_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) \
& PMD_MASK)

#define PKMAP_BASE \
((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
Expand Down
Loading

0 comments on commit 3527799

Please sign in to comment.