-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x86: enable CONFIG_USERSPACE on x86-64 #21710
x86: enable CONFIG_USERSPACE on x86-64 #21710
Conversation
All checks are passing now. Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
description of how retpolines work can be found here[1]. | ||
|
||
[1] https://support.google.com/faqs/answer/7625886 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
owww this is not working at all :/ totally in favor of removing it for now. The only caveat is that our fix requires cpu feature while thiis sw fix should work for all cpus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pr seems good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't find anything that looks wrong. One moderately serious nitpick and a few notes on test config.
@@ -3,3 +3,4 @@ CONFIG_ZTEST=y | |||
CONFIG_ZTEST_STACKSIZE=2048 | |||
CONFIG_MAX_THREAD_BYTES=4 | |||
CONFIG_USERSPACE=y | |||
CONFIG_SMP=n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but I tend to prever CONFIG_MP_NUM_CPUS=1 to CONFIG_SMP=n for code coverage reasons. The former has the effect you want, without the other configuration changes that disabling SMP causes.
ztest_1cpu_unit_test(test_multiple_futex_wait_wake), | ||
ztest_1cpu_unit_test(test_futex_wait_forever), | ||
ztest_1cpu_unit_test(test_futex_wait_timeout), | ||
ztest_1cpu_unit_test(test_futex_wait_nowait)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the 1cpu feature halts one CPU by grabbing a lock on it and spinning, which is fine from a correctness perspective but a performance hassle when run under sanitycheck (it doubles the load, and makes timer-related spurious failures more likely on other tests running on the host). In situations like this where basically everything is MP-unsafe it's probably better to use CONFIG_MP_NUM_CPUS=1 instead (or in addition to -- 1cpu obviously is a noop if there's only one cpu)
arch/x86/core/intel64/locore.S
Outdated
movl $0x600, %eax | ||
xorl %edx, %edx | ||
movl $X86_FMASK_MSR, %ecx | ||
wrmsr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semi-hard style issue here. I think this is the wrong place. go64 is the lowest level bootstrap code required to get us to z_cstart(), these MSRs won't be inspected by the hardware until after the first user thread is spawned much, much later. Why can't this be done in the C side of initialization with much simpler code, e.g.:
z_x86_msr_write(X86_LSTAR_MSR, z_x86_syscall_entry_stub);
z_x86_msr_write(X86_STAR_MSR, X86_STAR_UPPER << 32);
z_x86_msr_write(X86_FMASK_MSR, 0x600);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyross sure, it's better to do stuff in C if we can. I'll adjust this.
70da983
to
6b56e9b
Compare
updated: |
This doesn't work properly on x86 unless the dynamic thread struct allocated gets lucky and is aligned to 16 bytes. Disabling for now until zephyrproject-rtos#17893 is fixed. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
The regular expressions used by this test to determine success or failure get confounded if the log subsystem drops the wrong messages due to buffers being full. Just use minimal logging which synchronously logs everything. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Exceptions on x86_64 are incorrectly implemented, and if a preemptible thread faults, and in its overridden k_sys_fatal_error_handler() does something which invokes a scheduling point (such as here where we give semaphores), the thread will be swapped out on the per-CPU exception stack and probably explode when it is switched back in. For now, change the faulting thread priority to co-op so this doesn't happen. Workaround for zephyrproject-rtos#21462 Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
These were previously assumed to always be fatal. We can't have the faulting thread's XMM registers clobbered, so put the SIMD/FPU state onto the stack as well. This is fairly large (512 bytes) and the execption stack is already uncomfortably small, so increase to 2K. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Define MSR register addresses for various MSRs related to SYSCALL/SYSRET. We also add MSRs for FS/GS base addresses (for GS, both kernel and user mode) to support SWAPGS. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Easier to establish correspondence with the technical manuals. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
We don't need to set up GDT data descriptors for setting %gs. Instead, we use the x86 MSRs to set GS_BASE and KERNEL_GS_BASE. We don't currently allow user mode to set %gs on its own, but later on if we do, we have everything set up to issue 'swapgs' instructions on syscall or IRQ. Unused entries in the GDT have been removed. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
These are arranged in the particular order required by the syscall/sysret instructions. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
The esf has a different set of members on 64-bit. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Slightly different layout since the top-lebel PML4 is page-sized and must be page-aligned. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Add two new non-static APIs for dumping out the page table entries for a specified memory address, and move to the main MMU code. Has debugging uses when trying to figure out why memory domains are not set up correctly. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
z_x86_thread_page_tables_get() now works for both user and supervisor threads, returning the kernel page tables in the latter case. This API has been up-leveled to a common header. The per-thread privilege elevation stack initial stack pointer, and the per-thread page table locations are no longer computed from other values, and instead are stored in thread->arch. A problem where the wrong page tables were dumped out on certain kinds of page faults has been fixed. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
These are now common code, all are related to user mode threads. The rat's nest of ifdefs in ia32's arch_new_thread has been greatly simplified, there is now just one hook if user mode is turned on. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Includes linker script fragments for the kernel object tables and automatic memory partitions. The data section is moved to the end per the requirements of include/linker/kobject.h. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Nothing too fancy here, we try as much as possible to use the same register layout as the C calling convention. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
These are now part of the common Kconfig and we build spec_ctrl.c for all. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Hidden config to select dependencies. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
We should not be casting stuff related to memory sizes to u32_t. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This is causing problems, as if we create a thread in a system call we will *not* be using the kernel page tables if CONFIG_KPTI=n. Just don't fiddle with this page's permissions; we don't need it as a guard area anyway since we have a stack guard placed immediately before it, and this page is unused if user mode isn't active. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
We use a fixed value of 32 as the way interrupts/exceptions are setup in x86_64's locore.S do not lend themselves to Kconfig configuration of the vector to use. HW-based kernel oops is now permanently on, there's no reason to make it optional that I can see. Default vectors for IPI and irq offload adjusted to not collide. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This code: 1) Doesn't work 2) Hasn't ever been enabled by default 3) We mitigate Spectre V2 via Extended IBRS anyway Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
- In early boot, enable the syscall instruction and set up necessary MSRs - Add a hook to update page tables on context switch - Properly initialize thread based on whether it will start in user or supervisor mode - Add landing function for system calls to execute the desired handler - Implement arch_user_string_nlen() - Implement logic for dropping a thread down to user mode - Reserve per-CPU storage space for user and privilege elevation stack pointers, necessary for handling syscalls when no free registers are available - Proper handling of gs register considerations when transitioning privilege levels Kernel page table isolation (KPTI) is not yet implemented. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
See CVE-2019-1125. We mitigate this by adding an 'lfence' upon interrupt/exception entry after the decision has been made whether it's necessary to invoke 'swapgs' or not. Only applies to x86_64, 32-bit doesn't use swapgs. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
KPTI is still work-in-progress on x86_64. Don't allow user mode to be enabled unless the SOC/board configuration indicates that the CPU in use is invulnerable to meltdown attacks. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
The "key" parameter is legacy, remove it. Add a typedef for the expected function pointer type. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
No reason we need to stay in assembly domain once we have GS and a stack set up. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
35d21e4
to
ba145a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great here
This patch series implements CONFIG_USERSPACE on x86-64.
Kernel Page Table Isolation (KPTI) is still a work-in-progress, so user mode cannot be enabled unless the target board sets CONFIG_X86_NO_MELTDOWN, indicating it is immune to these attacks. This is set for qemu_x86_64 to allow testing.
All tests are reliably passing with SMP disabled. With SMP enabled, the sporadic data corruption issue described in #21317 may be observed in CI runs, although this issue has been present before this patch series.
Major changes:
Please see individual commit messages for more details.