[5.10] Track Steam performance patches #10

kakra · 2021-01-01T20:16:13Z

Export patch series: https://github.com/kakra/linux/pull/10.patch

…g system shutdown [ Upstream commit 45a2702 ] During Coldboot stress tests, system encountered the following panic. Panic logs depicts rt5682_i2c_shutdown() happened first and then later jack detect handler workqueue function triggered. This situation causes panic as rt5682_i2c_shutdown() resets codec. Fix this panic by cancelling all jack detection delayed work. Panic log: [ 20.936124] sof_pci_shutdown [ 20.940248] snd_sof_device_shutdown [ 20.945023] snd_sof_shutdown [ 21.126849] rt5682_i2c_shutdown [ 21.286053] rt5682_jack_detect_handler [ 21.291235] BUG: kernel NULL pointer dereference, address: 000000000000037c [ 21.299302] #PF: supervisor read access in kernel mode [ 21.305254] #PF: error_code(0x0000) - not-present page [ 21.311218] PGD 0 P4D 0 [ 21.314155] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 21.319206] CPU: 2 PID: 123 Comm: kworker/2:3 Tainted: G U 5.4.68 #10 [ 21.333687] ACPI: Preparing to enter system sleep state S5 [ 21.337669] Workqueue: events_power_efficient rt5682_jack_detect_handler [snd_soc_rt5682] [ 21.337671] RIP: 0010:rt5682_jack_detect_handler+0x6c/0x279 [snd_soc_rt5682] Fixes: a50067d ('ASoC: rt5682: split i2c driver into separate module') Signed-off-by: Jairaj Arava <jairaj.arava@intel.com> Signed-off-by: Sathyanarayana Nujella <sathyanarayana.nujella@intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@intel.com> Reviewed-by: Shuming Fan <shumingf@realtek.com> Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://lore.kernel.org/r/20210205171428.2344210-1-ranjani.sridharan@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit c5c97ca ] The ubsan reported the following error. It was because sample's raw data missed u32 padding at the end. So it broke the alignment of the array after it. The raw data contains an u32 size prefix so the data size should have an u32 padding after 8-byte aligned data. 27: Sample parsing :util/synthetic-events.c:1539:4: runtime error: store to misaligned address 0x62100006b9bc for type '__u64' (aka 'unsigned long long'), which requires 8 byte alignment 0x62100006b9bc: note: pointer points here 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ #0 0x561532a9fc96 in perf_event__synthesize_sample util/synthetic-events.c:1539:13 #1 0x5615327f4a4f in do_test tests/sample-parsing.c:284:8 #2 0x5615327f3f50 in test__sample_parsing tests/sample-parsing.c:381:9 #3 0x56153279d3a1 in run_test tests/builtin-test.c:424:9 #4 0x56153279c836 in test_and_print tests/builtin-test.c:454:9 #5 0x56153279b7eb in __cmd_test tests/builtin-test.c:675:4 #6 0x56153279abf0 in cmd_test tests/builtin-test.c:821:9 #7 0x56153264e796 in run_builtin perf.c:312:11 #8 0x56153264cf03 in handle_internal_command perf.c:364:8 #9 0x56153264e47d in run_argv perf.c:408:2 #10 0x56153264c9a9 in main perf.c:538:3 #11 0x7f137ab6fbbc in __libc_start_main (/lib64/libc.so.6+0x38bbc) #12 0x561532596828 in _start ... SUMMARY: UndefinedBehaviorSanitizer: misaligned-pointer-use util/synthetic-events.c:1539:4 in Fixes: 045f8cd ("perf tests: Add a sample parsing test") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210214091638.519643-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

Use [defer+madvise] as default khugepaged defrag strategy: For some reason, the default strategy to respond to THP fault fallbacks is still just madvise, meaning stall if the program wants transparent hugepages, but don't trigger a background reclaim / compaction if THP begins to fail allocations. This creates a snowball affect where we still use the THP code paths, but we almost always fail once a system has been active and busy for a while. The option "defer" was created for interactive systems where THP can still improve performance. If we have to fallback to a regular page due to an allocation failure or anything else, we will trigger a background reclaim and compaction so future THP attempts succeed and previous attempts eventually have their smaller pages combined without stalling running applications. We still want madvise to stall applications that explicitely want THP, so defer+madvise _does_ make a ton of sense. Make it the default for interactive systems, especially if the kernel maintainer left transparent hugepages on "always". Reasoning and details in the original patch: https://lwn.net/Articles/711248/ Signed-off-by: Kai Krakow <kai@kaishome.de>

This reverts commit d7c5ed7. Signed-off-by: Kai Krakow <kai@kaishome.de>

This reverts commit 9180bd4. Signed-off-by: Kai Krakow <kai@kaishome.de>

split the futex key setup from the queue locking and key reading. This is usefull to support the setup of multiple keys at the same time, like what is done in futex_requeue() and what will be done for the FUTEX_WAIT_MULTIPLE command. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

This is a new futex operation to allow a thread to wait on several futexes at the same time, and wake up on any of them. In a sense, it implements one of the features that was supported by pooling on the old FUTEX_FD interface. My use case for this feature lies in Wine, where we want to implement a similar function available in Windows, mainly for event handling. The wine folks have an implementation of the userspace side using eventfd, but it suffers from bad performance, as shown in the measurements below. Technically, the old FUTEX_WAIT implementation can be easily reimplemented using do_futex_wait_multiple, with a count one, and I have a patch demonstrating how it works. I'm not proposing it, since futex is such a tricky code, that I'd be more confortable to have FUTEX_WAIT_MULTIPLE running upstream for a couple development cycles, before considering modifying FUTEX_WAIT. This was tested using three mechanisms: 1) By reimplementing FUTEX_WAIT in terms of FUTEX_WAIT_MULTIPLE and running tools/testing/selftests/futex and a full linux distro on top of this kernel. 2) By an example code that exercises the FUTEX_WAIT_MULTIPLE path on a multi thread, event handling setup. 3) By running the Wine fsync implementation and executing multi-threaded applications, in particular modern games on top of the implementation. Signed-off-by: Zebediah Figura <z.figura12@gmail.com> Signed-off-by: Steven Noonan <steven@valvesoftware.com> Signed-off-by: Pierre-Loup A. Griffais <pgriffais@valvesoftware.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

Create a new set of futex syscalls known as futex2. This new interface is aimed to implement a more maintainable code, while removing obsolete features and expanding it with new functionalities. Implements wait and wake semantics for futexes, along with the base infrastructure for future operations. The whole wait path is designed to be used by N waiters, thus making easier to implement vectorized wait. * Syscalls implemented by this patch: - futex_wait(void *uaddr, unsigned int val, unsigned int flags, struct timespec *timo) The user thread is put to sleep, waiting for a futex_wake() at uaddr, if the value at *uaddr is the same as val (otherwise, the syscall returns immediately with -EAGAIN). timo is an optional timeout value for the operation. Return 0 on success, error code otherwise. - futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags) Wake `nr_wake` threads waiting at uaddr. Return the number of woken threads on success, error code otherwise. ** The `flag` argument The flag is used to specify the size of the futex word (FUTEX_[8, 16, 32, 64]). It's mandatory to define one, since there's no default size. By default, the timeout uses a monotonic clock, but can be used as a realtime one by using the FUTEX_REALTIME_CLOCK flag. By default, futexes are of the private type, that means that this user address will be accessed by threads that shares the same memory region. This allows for some internal optimizations, so they are faster. However, if the address needs to be shared with different processes (like using `mmap()` or `shm()`), they need to be defined as shared and the flag FUTEX_SHARED_FLAG is used to set that. By default, the operation has no NUMA-awareness, meaning that the user can't choose the memory node where the kernel side futex data will be stored. The user can choose the node where it wants to operate by setting the FUTEX_NUMA_FLAG and using the following structure (where X can be 8, 16, or 32, 64): struct futexX_numa { __uX value; __sX hint; }; This structure should be passed at the `void *uaddr` of futex functions. The address of the structure will be used to be waited/waken on, and the `value` will be compared to `val` as usual. The `hint` member is used to defined which node the futex will use. When waiting, the futex will be registered on a kernel-side table stored on that node; when waking, the futex will be searched for on that given table. That means that there's no redundancy between tables, and the wrong `hint` value will led to undesired behavior. Userspace is responsible for dealing with node migrations issues that may occur. `hint` can range from [0, MAX_NUMA_NODES], for specifying a node, or -1, to use the same node the current process is using. When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be stored on a global table on some node, defined at compilation time. ** The `timo` argument As per the Y2038 work done in the kernel, new interfaces shouldn't add timeout options known to be buggy. Given that, `timo` should be a 64bit timeout at all platforms, using an absolute timeout value. Signed-off-by: André Almeida <andrealmeid@collabora.com> --- [RFC Add futex2 syscall 0/0] Hi, This patch series introduces the futex2 syscalls. * What happened to the current futex()? For some years now, developers have been trying to add new features to futex, but maintainers have been reluctant to accept then, given the multiplexed interface full of legacy features and tricky to do big changes. Some problems that people tried to address with patchsets are: NUMA-awareness[0], smaller sized futexes[1], wait on multiple futexes[2]. NUMA, for instance, just doesn't fit the current API in a reasonable way. Considering that, it's not possible to merge new features into the current futex. ** The NUMA problem At the current implementation, all futex kernel side infrastructure is stored on a single node. Given that, all futex() calls issued by processors that aren't located on that node will have a memory access penalty when doing it. ** The 32bit sized futex problem Embedded systems or anything with memory constrains would benefit of using smaller sizes for the futex userspace integer. Also, a mutex implementation can be done using just three values, so 8 bits is enough for various scenarios. ** The wait on multiple problem The use case lies in the Wine implementation of the Windows NT interface WaitMultipleObjects. This Windows API function allows a thread to sleep waiting on the first of a set of event sources (mutexes, timers, signal, console input, etc) to signal. Considering this is a primitive synchronization operation for Windows applications, being able to quickly signal events on the producer side, and quickly go to sleep on the consumer side is essential for good performance of those running over Wine. [0] https://lore.kernel.org/lkml/20160505204230.932454245@linutronix.de/ [1] https://lore.kernel.org/lkml/20191221155659.3159-2-malteskarupke@web.de/ [2] https://lore.kernel.org/lkml/20200213214525.183689-1-andrealmeid@collabora.com/ * The solution As proposed by Peter Zijlstra and Florian Weimer[3], a new interface is required to solve this, which must be designed with those features in mind. futex2() is that interface. As opposed to the current multiplexed interface, the new one should have one syscall per operation. This will allow the maintainability of the API if it gets extended, and will help users with type checking of arguments. In particular, the new interface is extended to support the ability to wait on any of a list of futexes at a time, which could be seen as a vectored extension of the FUTEX_WAIT semantics. [3] https://lore.kernel.org/lkml/20200303120050.GC2596@hirez.programming.kicks-ass.net/ * The interface The new interface can be seen in details in the following patches, but this is a high level summary of what the interface can do: - Supports wake/wait semantics, as in futex() - Supports requeue operations, similarly as FUTEX_CMP_REQUEUE, but with individual flags for each address - Supports waiting for a vector of futexes, using a new syscall named futex_waitv() - Supports variable sized futexes (8bits, 16bits, 32bits and 64bits) - Supports NUMA-awareness operations, where the user can specify on which memory node would like to operate * Implementation The internal implementation follows a similar design to the original futex. Given that we want to replicate the same external behavior of current futex, this should be somewhat expected. For some functions, like the init and the code to get a shared key, I literally copied code and comments from kernel/futex.c. I decided to do so instead of exposing the original function as a public function since in that way we can freely modify our implementation if required, without any impact on old futex. Also, the comments precisely describes the details and corner cases of the implementation. Each patch contains a brief description of implementation, but patch 6 "docs: locking: futex2: Add documentation" adds a more complete document about it. * The patchset This patchset can be also found at my git tree: https://gitlab.collabora.com/tonyk/linux/-/tree/futex2-dev - Patch 1: Implements wait/wake, and the basics foundations of futex2 - Patches 2-4: Implement the remaining features (shared, waitv, requeue). - Patch 5: Adds the x86_x32 ABI handling. I kept it in a separated patch since I'm not sure if x86_x32 is still a thing, or if it should return -ENOSYS. - Patch 6: Add a documentation file which details the interface and the internal implementation. - Patches 7-13: Selftests for all operations along with perf support for futex2. - Patch 14: While working on porting glibc for futex2, I found out that there's a futex_wake() call at the user thread exit path, if that thread was created with clone(..., CLONE_CHILD_SETTID, ...). In order to make pthreads work with futex2, it was required to add this patch. Note that this is more a proof-of-concept of what we will need to do in future, rather than part of the interface and shouldn't be merged as it is. * Testing: This patchset provides selftests for each operation and their flags. Along with that, the following work was done: ** Stability To stress the interface in "real world scenarios": - glibc[4]: nptl's low level locking was modified to use futex2 API (except for robust and PI things). All relevant nptl/ tests passed. - Wine[5]: Proton/Wine was modified in order to use futex2() for the emulation of Windows NT sync mechanisms based on futex, called "fsync". Triple-A games with huge CPU's loads and tons of parallel jobs worked as expected when compared with the previous FUTEX_WAIT_MULTIPLE implementation at futex(). Some games issue 42k futex2() calls per second. - Full GNU/Linux distro: I installed the modified glibc in my host machine, so all pthread's programs would use futex2(). After tweaking systemd[6] to allow futex2() calls at seccomp, everything worked as expected (web browsers do some syscall sandboxing and need some configuration as well). - perf: The perf benchmarks tests can also be used to stress the interface, and they can be found in this patchset. ** Performance - For comparing futex() and futex2() performance, I used the artificial benchmarks implemented at perf (wake, wake-parallel, hash and requeue). The setup was 200 runs for each test and using 8, 80, 800, 8000 for the number of threads, Note that for this test, I'm not using patch 14 ("kernel: Enable waitpid() for futex2") , for reasons explained at "The patchset" section. - For the first three ones, I measured an average of 4% gain in performance. This is not a big step, but it shows that the new interface is at least comparable in performance with the current one. - For requeue, I measured an average of 21% decrease in performance compared to the original futex implementation. This is expected given the new design with individual flags. The performance trade-offs are explained at patch 4 ("futex2: Implement requeue operation"). [4] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2 [5] https://gitlab.collabora.com/tonyk/wine/-/tree/proton_5.13 [6] https://gitlab.collabora.com/tonyk/systemd * FAQ ** "Where's the code for NUMA and FUTEX_8/16/64?" The current code is already complex enough to take some time for review, so I believe it's better to split that work out to a future iteration of this patchset. Besides that, this RFC is the core part of the infrastructure, and the following features will not pose big design changes to it, the work will be more about wiring up the flags and modifying some functions. ** "Where's the PI/robust stuff?" As said by Peter Zijlstra at [3], all those new features are related to the "simple" futex interface, that doesn't use PI or robust. Do we want to have this complexity at futex2() and if so, should it be part of this patchset or can it be future work? Thanks, André * Changelog Changes from v2: - API now supports 64bit futexes, in addition to 8, 16 and 32. - Refactored futex2_wait and futex2_waitv selftests v2: https://lore.kernel.org/lkml/20210304004219.134051-1-andrealmeid@collabora.com/ Changes from v1: - Unified futex_set_timer_and_wait and __futex_wait code - Dropped _carefull from linked list function calls - Fixed typos on docs patch - uAPI flags are now added as features are introduced, instead of all flags in patch 1 - Removed struct futex_single_waiter in favor of an anon struct v1: https://lore.kernel.org/lkml/20210215152404.250281-1-andrealmeid@collabora.com/

Add support for shared futexes for cross-process resources. This design relies on the same approach done in old futex to create an unique id for file-backed shared memory, by using a counter at struct inode. There are two types of futexes: private and shared ones. The private are futexes meant to be used by threads that shares the same memory space, are easier to be uniquely identified an thus can have some performance optimization. The elements for identifying one are: the start address of the page where the address is, the address offset within the page and the current->mm pointer. Now, for uniquely identifying shared futex: - If the page containing the user address is an anonymous page, we can just use the same data used for private futexes (the start address of the page, the address offset within the page and the current->mm pointer) that will be enough for uniquely identifying such futex. We also set one bit at the key to differentiate if a private futex is used on the same address (mixing shared and private calls are not allowed). - If the page is file-backed, current->mm maybe isn't the same one for every user of this futex, so we need to use other data: the page->index, an UUID for the struct inode and the offset within the page. Note that members of futex_key doesn't have any particular meaning after they are part of the struct - they are just bytes to identify a futex. Given that, we don't need to use a particular name or type that matches the original data, we only need to care about the bitsize of each component and make both private and shared data fit in the same memory space. Signed-off-by: André Almeida <andrealmeid@collabora.com>

Add support to wait on multiple futexes. This is the interface implemented by this syscall: futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes, unsigned int flags, struct timespec *timo) struct futex_waitv { __u64 val; void *uaddr; unsigned int flags; }; Given an array of struct futex_waitv, wait on each uaddr. The thread wakes if a futex_wake() is performed at any uaddr. The syscall returns immediately if any waiter has *uaddr != val. *timo is an optional timeout value for the operation. The flags argument of the syscall should be used solely for specifying the timeout as realtime, if needed. Flags for shared futexes, sizes, etc. should be used on the individual flags of each waiter. Returns the array index of one of the awakened futexes. There’s no given information of how many were awakened, or any particular attribute of it (if it’s the first awakened, if it is of the smaller index...). Signed-off-by: André Almeida <andrealmeid@collabora.com>

Implement requeue interface similarly to FUTEX_CMP_REQUEUE operation. This is the syscall implemented by this patch: futex_requeue(struct futex_requeue *uaddr1, struct futex_requeue *uaddr2, unsigned int nr_wake, unsigned int nr_requeue, u64 cmpval, unsigned int flags) struct futex_requeue { void *uaddr; unsigned int flags; }; If (uaddr1->uaddr == cmpval), wake at uaddr1->uaddr a nr_wake number of waiters and then, remove a number of nr_requeue waiters at uaddr1->uaddr and add them to uaddr2->uaddr list. Each uaddr has its own set of flags, that must be defined at struct futex_requeue (such as size, shared, NUMA). The flags argument of the syscall is there just for the sake of extensibility, and right now it needs to be zero. Return the number of the woken futexes + the number of requeued ones on success, error code otherwise. Signed-off-by: André Almeida <andrealmeid@collabora.com> --- The original FUTEX_CMP_REQUEUE interfaces is such as follows: futex(*uaddr1, FUTEX_CMP_REQUEUE, nr_wake, nr_requeue, *uaddr2, cmpval); Given that when this interface was created they was only one type of futex (as opposed to futex2, where there is shared, sizes, and NUMA), there was no way to specify individual flags for uaddr1 and 2. When FUTEX_PRIVATE was implemented, a new opcode was created as well (FUTEX_CMP_REQUEUE_PRIVATE), but they apply both futexes, so they should be of the same type regarding private/shared. This imposes a limitation on the use cases of the operation, and to overcome that at futex2, `struct futex_requeue` was created, so one can set individual flags for each futex. This flexibility is a trade-off with performance, given that now we need to perform two extra copy_from_user(). One alternative would be to use the upper half of flags bits to the first one, and the bottom half for the second futex, but this would also impose limitations, given that we would limit by half the flags possibilities. If equal futexes are common enough, the following extension could be added to overcome the current performance: - A flag FUTEX_REQUEUE_EQUAL is added to futex2() flags; - If futex_requeue() see this flag, that means that both futexes uses the same set of attributes. - Then, the function parses the flags as of futex_wait/wake(). - *uaddr1 and *uaddr2 are used as void* (instead of struct futex_requeue) just like wait/wake(). In that way, we could avoid the copy_from_user().

New syscalls should use the same entry point for x86_64 and x86_x32 paths. Add a wrapper for x32 calls to use parse functions that assumes 32bit pointers. Signed-off-by: André Almeida <andrealmeid@collabora.com>

Add a new documentation file specifying both userspace API and internal implementation details of futex2 syscalls. Signed-off-by: André Almeida <andrealmeid@collabora.com>

Add a simple file to test wake/wait mechanism using futex2 interface. Test three scenarios: using a common local int variable as private futex, a shm futex as shared futex and a file-backed shared memory as a shared futex. This should test all branches of futex_get_key(). Create helper files so more tests can evaluate futex2. While 32bit ABIs from glibc aren't yet able to use 64 bit sized time variables, add a temporary workaround that implements the required types and calls the appropriated syscalls, since futex2 doesn't supports 32 bit sized time. Signed-off-by: André Almeida <andrealmeid@collabora.com>

Adapt existing futex wait timeout file to test the same mechanism for futex2. futex2 accepts only absolute 64bit timers, but supports both monotonic and realtime clocks. Signed-off-by: André Almeida <andrealmeid@collabora.com>

Adapt existing futex wait wouldblock file to test the same mechanism for futex2. Signed-off-by: André Almeida <andrealmeid@collabora.com>

Create a new file to test the waitv mechanism. Test both private and shared futexes. Wake the last futex in the array, and check if the return value from futex_waitv() is the right index. Signed-off-by: André Almeida <andrealmeid@collabora.com>

Add testing for futex_requeue(). The first test just requeue from one waiter to another one, and wake it. The second performs both wake and requeue, and we check return values to see if the operation woke/requeued the expected number of waiters. Signed-off-by: André Almeida <andrealmeid@collabora.com>

Add support at the existing futex benchmarking code base to enable futex2 calls. `perf bench` tests can be used not only as a way to measure the performance of implementation, but also as stress testing for the kernel infrastructure. Signed-off-by: André Almeida <andrealmeid@collabora.com>

To make pthreads works as expected if they are using futex2, wake clear_child_tid with futex2 as well. This is make applications that uses waitpid() (and clone(CLONE_CHILD_SETTID)) wake while waiting for the child to terminate. Given that apps should not mix futex() and futex2(), any correct app will trigger a harmless noop wakeup on the interface that it isn't using. Signed-off-by: André Almeida <andrealmeid@collabora.com> --- This commit is here for the intend to show what we need to do in order to get a full NPTL working on top of futex2. It should be merged after we talk to glibc folks on the details around the futex_wait() side. For instance, we could use this as an opportunity to use private futexes or 8bit sized futexes, but both sides need to use the exactly same flags.

In the course of futex2 development, it will be rebased on top of different kernel releases, and the syscall number can change in this process. Expose futex2 syscall number via sysfs so tools that are experimenting with futex2 (like Proton/Wine) can test it and set the syscall number at runtime, rather than setting it at compilation time. Signed-off-by: André Almeida <andrealmeid@collabora.com>

Signed-off-by: Kai Krakow <kai@kaishome.de>

…C_IOC_WAIT_ANY. Signed-off-by: Kai Krakow <kai@kaishome.de>

…C_IOC_WAIT_ALL. Signed-off-by: Kai Krakow <kai@kaishome.de>

Signed-off-by: Kai Krakow <kai@kaishome.de>

…TONWAIT and for WINESYNC_IOC_GET_SEM. Signed-off-by: Kai Krakow <kai@kaishome.de>

Signed-off-by: Kai Krakow <kai@kaishome.de>

…tances commit cad83c9 upstream. As syzbot reported, there is an use-after-free issue during f2fs recovery: Use-after-free write at 0xffff88823bc16040 (in kfence-#10): kmem_cache_destroy+0x1f/0x120 mm/slab_common.c:486 f2fs_recover_fsync_data+0x75b0/0x8380 fs/f2fs/recovery.c:869 f2fs_fill_super+0x9393/0xa420 fs/f2fs/super.c:3945 mount_bdev+0x26c/0x3a0 fs/super.c:1367 legacy_get_tree+0xea/0x180 fs/fs_context.c:592 vfs_get_tree+0x86/0x270 fs/super.c:1497 do_new_mount fs/namespace.c:2905 [inline] path_mount+0x196f/0x2be0 fs/namespace.c:3235 do_mount fs/namespace.c:3248 [inline] __do_sys_mount fs/namespace.c:3456 [inline] __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433 do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae The root cause is multi f2fs filesystem instances can race on accessing global fsync_entry_slab pointer, result in use-after-free issue of slab cache, fixes to init/destroy this slab cache only once during module init/destroy procedure to avoid this issue. Reported-by: syzbot+9d90dad32dd9727ed084@syzkaller.appspotmail.com Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kakra · 2021-11-13T17:15:46Z

Closed in favor of #16

[ Upstream commit 4503cc7 ] Do not allow to write timestamps on RX rings if PF is being configured. When PF is being configured RX rings can be freed or rebuilt. If at the same time timestamps are updated, the kernel will crash by dereferencing null RX ring pointer. PID: 1449 TASK: ff187d28ed658040 CPU: 34 COMMAND: "ice-ptp-0000:51" #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be #1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d #2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd #3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54 #4 [ff1966a94a713d08] no_context at ffffffff9d06bda4 #5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c #6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4 #7 [ff1966a94a713de0] page_fault at ffffffff9da0107e [exception RIP: ice_ptp_update_cached_phctime+91] RIP: ffffffffc076db8b RSP: ff1966a94a713e98 RFLAGS: 00010246 RAX: 16e3db9c6b7ccae4 RBX: ff187d269dd3c180 RCX: ff187d269cd4d018 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ff187d269cfcc644 R8: ff187d339b9641b0 R9: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: ff187d269cfcc648 R13: ffffffff9f128784 R14: ffffffff9d101b70 R15: ff187d269cfcc640 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice] #9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b #10 [ff1966a94a713f10] kthread at ffffffff9d101b4d #11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f Fixes: 77a7811 ("ice: enable receive hardware timestamping") Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Dave Cain <dcain@redhat.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit ad25f5c ] There's a locking issue with the per-netns list of calls in rxrpc. The pieces of code that add and remove a call from the list use write_lock() and the calls procfile uses read_lock() to access it. However, the timer callback function may trigger a removal by trying to queue a call for processing and finding that it's already queued - at which point it has a spare refcount that it has to do something with. Unfortunately, if it puts the call and this reduces the refcount to 0, the call will be removed from the list. Unfortunately, since the _bh variants of the locking functions aren't used, this can deadlock. ================================ WARNING: inconsistent lock state 5.18.0-rc3-build4+ #10 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes: ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b {SOFTIRQ-ON-W} state was registered at: ... Possible unsafe locking scenario: CPU0 ---- lock(&rxnet->call_lock); <Interrupt> lock(&rxnet->call_lock); *** DEADLOCK *** 1 lock held by ksoftirqd/2/25: #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d Changes ======= ver #2) - Changed to using list_next_rcu() rather than rcu_dereference() directly. Fixes: 17926a7 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 391e982 ] It is trivial to craft a module to trigger OOB access in this line: if (info->secstrings[strhdr->sh_size - 1] != '\0') { BUG: unable to handle page fault for address: ffffc90000aa0fff PGD 100000067 P4D 100000067 PUD 100066067 PMD 10436f067 PTE 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 7 PID: 1215 Comm: insmod Not tainted 5.18.0-rc5-00007-g9bf578647087-dirty #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:load_module+0x19b/0x2391 Fixes: ec2a295 ("module: harden ELF info handling") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> [rebased patch onto modules-next] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit a699781 ] A sysfs reader can race with a device reset or removal, attempting to read device state when the device is not actually present. eg: [exception RIP: qed_get_current_link+17] #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede] #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c torvalds#13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb crash> struct net_device.state ffff9a9d21336000 state = 5, state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100). The device is not present, note lack of __LINK_STATE_PRESENT (0b10). This is the same sort of panic as observed in commit 4224cfd ("net-sysfs: add check for netdevice being present to speed_show"). There are many other callers of __ethtool_get_link_ksettings() which don't have a device presence check. Move this check into ethtool to protect all callers. Fixes: d519e17 ("net: export device speed and duplex via sysfs") Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

kakra force-pushed the rebase-5.10/steam-patches branch from 0f3758f to 6681c6a Compare January 31, 2021 04:15

kakra force-pushed the master branch from 05f6d2a to b74b991 Compare March 5, 2021 23:49

kakra mentioned this pull request Mar 10, 2021

Horizon Zero Dawn (1151640) ValveSoftware/Proton#4125

Open

2 tasks

kakra changed the base branch from master to base-5.10 March 10, 2021 18:11

kakra force-pushed the rebase-5.10/steam-patches branch from 6681c6a to 8a9286c Compare May 9, 2021 10:46

heftig and others added 23 commits July 3, 2021 08:11

Revert "futex: Remove needless goto's"

b09e668

This reverts commit d7c5ed7. Signed-off-by: Kai Krakow <kai@kaishome.de>

Revert "futex: Remove put_futex_key()"

ed7a061

This reverts commit 9180bd4. Signed-off-by: Kai Krakow <kai@kaishome.de>

futex: Change WAIT_MULTIPLE opcode to 31

640dc2e

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

futex2: Add compatibility entry point for x86_x32 ABI

0a3d1d9

New syscalls should use the same entry point for x86_64 and x86_x32 paths. Add a wrapper for x32 calls to use parse functions that assumes 32bit pointers. Signed-off-by: André Almeida <andrealmeid@collabora.com>

docs: locking: futex2: Add documentation

b3a2926

Add a new documentation file specifying both userspace API and internal implementation details of futex2 syscalls. Signed-off-by: André Almeida <andrealmeid@collabora.com>

selftests: futex2: Add timeout test

bfe3d0d

Adapt existing futex wait timeout file to test the same mechanism for futex2. futex2 accepts only absolute 64bit timers, but supports both monotonic and realtime clocks. Signed-off-by: André Almeida <andrealmeid@collabora.com>

selftests: futex2: Add wouldblock test

0d7a890

Adapt existing futex wait wouldblock file to test the same mechanism for futex2. Signed-off-by: André Almeida <andrealmeid@collabora.com>

selftests: futex2: Add waitv test

23f6a53

Create a new file to test the waitv mechanism. Test both private and shared futexes. Wake the last futex in the array, and check if the return value from futex_waitv() is the right index. Signed-off-by: André Almeida <andrealmeid@collabora.com>

winesync: Introduce the winesync driver and character device.

cc823dc

Signed-off-by: Kai Krakow <kai@kaishome.de>

winesync: Reserve a minor device number and ioctl range.

bed06a9

Signed-off-by: Kai Krakow <kai@kaishome.de>

winesync: Introduce WINESYNC_IOC_CREATE_SEM and WINESYNC_IOC_DELETE.

a2b1b69

Signed-off-by: Kai Krakow <kai@kaishome.de>

zfigura added 20 commits July 3, 2021 15:10

winesync: Introduce WINESYNC_IOC_PUT_MUTEX.

13726fd

Signed-off-by: Kai Krakow <kai@kaishome.de>

winesync: Introduce WINESYNC_IOC_KILL_OWNER.

bff9174

Signed-off-by: Kai Krakow <kai@kaishome.de>

winesync: Introduce WINESYNC_IOC_READ_SEM.

0f71ea5

Signed-off-by: Kai Krakow <kai@kaishome.de>

winesync: Introduce WINESYNC_IOC_READ_MUTEX.

cd26c14

Signed-off-by: Kai Krakow <kai@kaishome.de>

doc: Add documentation for the winesync uAPI.

9404ca9

Signed-off-by: Kai Krakow <kai@kaishome.de>

selftests: winesync: Add some tests for semaphore state.

10283c1

Signed-off-by: Kai Krakow <kai@kaishome.de>

selftests: winesync: Add some tests for mutex state.

e521147

Signed-off-by: Kai Krakow <kai@kaishome.de>

selftests: winesync: Add some tests for WINESYNC_IOC_WAIT_ANY.

4d2f875

Signed-off-by: Kai Krakow <kai@kaishome.de>

selftests: winesync: Add some tests for WINESYNC_IOC_WAIT_ALL.

652a828

Signed-off-by: Kai Krakow <kai@kaishome.de>

selftests: winesync: Add some tests for invalid object handling.

0a4a6f6

Signed-off-by: Kai Krakow <kai@kaishome.de>

selftests: winesync: Add some tests for wakeup signaling with WINESYN…

4752786

…C_IOC_WAIT_ANY. Signed-off-by: Kai Krakow <kai@kaishome.de>

selftests: winesync: Add some tests for wakeup signaling with WINESYN…

fecaecc

…C_IOC_WAIT_ALL. Signed-off-by: Kai Krakow <kai@kaishome.de>

maintainers: Add an entry for winesync.

078a1e8

Signed-off-by: Kai Krakow <kai@kaishome.de>

winesync: Introduce the WINESYNC_SEM_GETONWAIT flag.

787f6b8

Signed-off-by: Kai Krakow <kai@kaishome.de>

doc: Document the WINESYNC_SEM_GETONWAIT flag.

010e019

Signed-off-by: Kai Krakow <kai@kaishome.de>

winesync: Introduce WINESYNC_IOC_GET_SEM.

76e590a

Signed-off-by: Kai Krakow <kai@kaishome.de>

winesync: Introduce WINESYNC_IOC_PULSE_SEM.

1535e93

Signed-off-by: Kai Krakow <kai@kaishome.de>

doc: Document WINESYNC_IOC_GET_SEM and WINESYNC_IOC_PULSE_SEM.

04cb576

Signed-off-by: Kai Krakow <kai@kaishome.de>

selftests: winesync: Add tests for semaphores without WINESYNC_SEM_GE…

e871a00

…TONWAIT and for WINESYNC_IOC_GET_SEM. Signed-off-by: Kai Krakow <kai@kaishome.de>

selftests: winesync: Add tests for WINESYNC_IOC_PULSE_SEM.

d344c33

Signed-off-by: Kai Krakow <kai@kaishome.de>

kakra force-pushed the rebase-5.10/steam-patches branch from 8a9286c to d344c33 Compare July 3, 2021 13:10

mm: Support soft dirty flag reset for VA range.

53ee04a

kakra mentioned this pull request Oct 2, 2021

Red Dead Redemption 2 (1174180) ValveSoftware/Proton#3291

Open

2 tasks

kakra closed this Nov 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[5.10] Track Steam performance patches #10

[5.10] Track Steam performance patches #10

kakra commented Jan 1, 2021 •

edited

Loading

kakra commented Nov 13, 2021

[5.10] Track Steam performance patches #10

[5.10] Track Steam performance patches #10

Conversation

kakra commented Jan 1, 2021 • edited Loading

kakra commented Nov 13, 2021

kakra commented Jan 1, 2021 •

edited

Loading