This repository was archived by the owner on Jun 18, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 29
Pull in linux#master + bpf/for-next #144
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Breno Leitao says: ==================== net: Fix MODULE_DESCRIPTION() for net (p5) There are hundreds of network modules that misses MODULE_DESCRIPTION(), causing a warning when compiling with W=1. Example: WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_cmp.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_nbyte.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_u32.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_meta.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_text.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_canid.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ip_tunnel.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ipip.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ip_gre.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/udp_tunnel.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ip_vti.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ah4.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/esp4.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/xfrm4_tunnel.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/tunnel4.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/xfrm/xfrm_algo.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/xfrm/xfrm_user.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv6/ah6.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv6/esp6.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv6/xfrm6_tunnel.o WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv6/tunnel6.o This part5 of the patchset focus on the missing net/ module, which are now warning free. v1: https://lore.kernel.org/all/20240205101400.1480521-1-leitao@debian.org/ v2: https://lore.kernel.org/all/20240207101929.484681-1-leitao@debian.org/ ==================== Link: https://lore.kernel.org/r/20240208164244.3818498-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We've had issues with gcc and 'asm goto' before, and we created a 'asm_volatile_goto()' macro for that in the past: see commits 3f0116c ("compiler/gcc4: Add quirk for 'asm goto' miscompilation bug") and a9f1803 ("compiler/gcc4: Make quirk for asm_volatile_goto() unconditional"). Then, much later, we ended up removing the workaround in commit 43c249e ("compiler-gcc.h: remove ancient workaround for gcc PR 58670") because we no longer supported building the kernel with the affected gcc versions, but we left the macro uses around. Now, Sean Christopherson reports a new version of a very similar problem, which is fixed by re-applying that ancient workaround. But the problem in question is limited to only the 'asm goto with outputs' cases, so instead of re-introducing the old workaround as-is, let's rename and limit the workaround to just that much less common case. It looks like there are at least two separate issues that all hit in this area: (a) some versions of gcc don't mark the asm goto as 'volatile' when it has outputs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420 which is easy to work around by just adding the 'volatile' by hand. (b) Internal compiler errors: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422 which are worked around by adding the extra empty 'asm' as a barrier, as in the original workaround. but the problem Sean sees may be a third thing since it involves bad code generation (not an ICE) even with the manually added 'volatile'. but the same old workaround works for this case, even if this feels a bit like voodoo programming and may only be hiding the issue. Reported-and-tested-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/ Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Andrew Pinski <quic_apinski@quicinc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
…p/linux-ntfs3 Pull ntfs3 fixes from Konstantin Komarov: "Fixed: - size update for compressed file - some logic errors, overflows - memory leak - some code was refactored Added: - implement super_operations::shutdown Improved: - alternative boot processing - reduced stack usage" * tag 'ntfs3_for_6.8' of https://github.com/Paragon-Software-Group/linux-ntfs3: (28 commits) fs/ntfs3: Slightly simplify ntfs_inode_printk() fs/ntfs3: Add ioctl operation for directories (FITRIM) fs/ntfs3: Fix oob in ntfs_listxattr fs/ntfs3: Fix an NULL dereference bug fs/ntfs3: Update inode->i_size after success write into compressed file fs/ntfs3: Fixed overflow check in mi_enum_attr() fs/ntfs3: Correct function is_rst_area_valid fs/ntfs3: Use i_size_read and i_size_write fs/ntfs3: Prevent generic message "attempt to access beyond end of device" fs/ntfs3: use non-movable memory for ntfs3 MFT buffer cache fs/ntfs3: Use kvfree to free memory allocated by kvmalloc fs/ntfs3: Disable ATTR_LIST_ENTRY size check fs/ntfs3: Fix c/mtime typo fs/ntfs3: Add NULL ptr dereference checking at the end of attr_allocate_frame() fs/ntfs3: Add and fix comments fs/ntfs3: ntfs3_forced_shutdown use int instead of bool fs/ntfs3: Implement super_operations::shutdown fs/ntfs3: Drop suid and sgid bits as a part of fpunch fs/ntfs3: Add file_modified fs/ntfs3: Correct use bh_read ...
Pull ceph fixes from Ilya Dryomov: "Some fscrypt-related fixups (sparse reads are used only for encrypted files) and two cap handling fixes from Xiubo and Rishabh" * tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client: ceph: always check dir caps asynchronously ceph: prevent use-after-free in encode_cap_msg() ceph: always set initial i_blkbits to CEPH_FSCRYPT_BLOCK_SHIFT libceph: just wait for more data to be available on the socket libceph: rename read_sparse_msg_*() to read_partial_sparse_msg_*() libceph: fail sparse-read if the data length doesn't match
…cifs-2.6 Pull smb client fixes from Steve French: - reconnect fix - multichannel channel selection fix - minor mount warning fix - reparse point fix - null pointer check improvement * tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: clarify mount warning cifs: handle cases where multiple sessions share connection cifs: change tcon status when need_reconnect is set on it smb: client: set correct d_type for reparse points under DFS mounts smb3: add missing null server pointer check
…it/jejb/scsi Pull SCSI fixes from James Bottomley: "Three small driver fixes and one core fix. The core fix being a fixup to the one in the last pull request which didn't entirely move checking of scsi_host_busy() out from under the host lock" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: core: Remove the ufshcd_release() in ufshcd_err_handling_prepare() scsi: ufs: core: Fix shift issue in ufshcd_clear_cmd() scsi: lpfc: Use unsigned type for num_sge scsi: core: Move scsi_host_busy() out of host lock if it is for per-command
Pull smb server fixes from Steve French: "Two ksmbd server fixes: - memory leak fix - a minor kernel-doc fix" * tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails ksmbd: Add kernel-doc for ksmbd_extract_sharename() function
…nux/kernel/git/ieee1394/linux1394 Pull firewire fix from Takashi Sakamoto: "A change to accelerate the device detection step in some cases. In the self-identification step after bus-reset, all nodes in the same bus broadcast selfID packet including the value of gap count. The value is related to the cable hops between nodes, and used to calculate the subaction gap and the arbitration reset gap. When each node has the different value of the gap count, the asynchronous communication between them is unreliable, since an asynchronous transaction could be interrupted by another asynchronous transaction before completion. The gap count inconsistency can be resolved by several ways; e.g. the transfer of PHY configuration packet and generation of bus-reset. The current implementation of firewire stack can correctly detect the gap count inconsistency, however the recovery action from the inconsistency tends to be delayed after reading configuration ROM of root node. This results in the long time to probe devices in some combinations of hardware. Here the stack is changed to schedule the action as soon as possible" * tag 'firewire-fixes-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: core: send bus reset promptly on gap count error
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - Update a potentially stale firmware attribute (Maurizio) - Fixes for the recent verbose error logging (Keith, Chaitanya) - Protection information payload size fix for passthrough (Francis) - Fix for a queue freezing issue in virtblk (Yi) - blk-iocost underflow fix (Tejun) - blk-wbt task detection fix (Jan) * tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux: virtio-blk: Ensure no requests in virtqueues before deleting vqs. blk-iocost: Fix an UBSAN shift-out-of-bounds warning nvme: use ns->head->pi_size instead of t10_pi_tuple structure size nvme-core: fix comment to reflect right functions nvme: move passthrough logging attribute to head blk-wbt: Fix detection of dirty-throttled tasks nvme-host: fix the updating of the firmware version
Factor out waiting for async encrypt and decrypt to finish. There are already multiple copies and a subsequent fix will need more. No functional changes. Note that crypto_wait_req() returns wait->err Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
The submitting thread (one which called recvmsg/sendmsg) may exit as soon as the async crypto handler calls complete() so any code past that point risks touching already freed data. Try to avoid the locking and extra flags altogether. Have the main thread hold an extra reference, this way we can depend solely on the atomic ref counter for synchronization. Don't futz with reiniting the completion, either, we are now tightly controlling when completion fires. Reported-by: valis <sec@valis.email> Fixes: 0cada33 ("net/tls: fix race condition causing kernel panic") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to previous commit, the submitting thread (recvmsg/sendmsg) may exit as soon as the async crypto handler calls complete(). Reorder scheduling the work before calling complete(). This seems more logical in the first place, as it's the inverse order of what the submitting thread will do. Reported-by: valis <sec@valis.email> Fixes: a42055e ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Since we're setting the CRYPTO_TFM_REQ_MAY_BACKLOG flag on our requests to the crypto API, crypto_aead_{encrypt,decrypt} can return -EBUSY instead of -EINPROGRESS in valid situations. For example, when the cryptd queue for AESNI is full (easy to trigger with an artificially low cryptd.cryptd_max_cpu_qlen), requests will be enqueued to the backlog but still processed. In that case, the async callback will also be called twice: first with err == -EINPROGRESS, which it seems we can just ignore, then with err == 0. Compared to Sabrina's original patch this version uses the new tls_*crypt_async_wait() helpers and converts the EBUSY to EINPROGRESS to avoid having to modify all the error handling paths. The handling is identical. Fixes: a54667f ("tls: Add support for encryption using async offload accelerator") Fixes: 94524d8 ("net/tls: Add support for async decryption of tls records") Co-developed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/netdev/9681d1febfec295449a62300938ed2ae66983f28.1694018970.git.sd@queasysnail.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
tls_decrypt_sg doesn't take a reference on the pages from clear_skb, so the put_page() in tls_decrypt_done releases them, and we trigger a use-after-free in process_rx_list when we try to read from the partially-read skb. Fixes: fd31f39 ("tls: rx: decrypt into a fresh skb") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This exact case was fail for async crypto and we weren't catching it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
We double count async, non-zc rx data. The previous fix was lucky because if we fully zc async_copy_bytes is 0 so we add 0. Decrypted already has all the bytes we handled, in all cases. We don't have to adjust anything, delete the erroneous line. Fixes: 4d42cd6 ("tls: rx: fix return value for async crypto") Co-developed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says: ==================== net: tls: fix some issues with async encryption valis was reporting a race on socket close so I sat down to try to fix it. I used Sabrina's async crypto debug patch to test... and in the process run into some of the same issues, and created very similar fixes :( I didn't realize how many of those patches weren't applied. Once I found Sabrina's code [1] it turned out to be so similar in fact that I added her S-o-b's and Co-develop'eds in a semi-haphazard way. With this series in place all expected tests pass with async crypto. Sabrina had a few more fixes, but I'll leave those to her, things are not crashing anymore. [1] https://lore.kernel.org/netdev/cover.1694018970.git.sd@queasysnail.net/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
…rg/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "21 hotfixes. 12 are cc:stable and the remainder pertain to post-6.7 issues or aren't considered to be needed in earlier kernel versions" * tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) nilfs2: fix potential bug in end_buffer_async_write mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup nilfs2: fix hang in nilfs_lookup_dirty_data_buffers() MAINTAINERS: Leo Yan has moved mm/zswap: don't return LRU_SKIP if we have dropped lru lock fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super mailmap: switch email address for John Moon mm: zswap: fix objcg use-after-free in entry destruction mm/madvise: don't forget to leave lazy MMU mode in madvise_cold_or_pageout_pte_range() arch/arm/mm: fix major fault accounting when retrying under per-VMA lock selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page mm: memcg: optimize parent iteration in memcg_rstat_updated() nilfs2: fix data corruption in dsync block recovery for small block sizes mm/userfaultfd: UFFDIO_MOVE implementation should use ptep_get() exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock) fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand() getrusage: use sig->stats_lock rather than lock_task_sighand() getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand() ...
Since commit 13f5826 ("ASoC: soc.h: don't create dummy Component via COMP_DUMMY()") dummy snd_soc_dai_link.codecs entries no longer have a name set. This means that when looking for the codec dai_link the machine driver can no longer unconditionally run strcmp() on snd_soc_dai_link.codecs[0].name since this may now be NULL. Add a check for snd_soc_dai_link.codecs[0].name being NULL to all BYT/CHT machine drivers to avoid NULL pointer dereferences in their probe() methods. Fixes: 13f5826 ("ASoC: soc.h: don't create dummy Component via COMP_DUMMY()") Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240210134400.24913-2-hdegoede@redhat.com Signed-off-by: Mark Brown <broonie@kernel.org>
4 fixes / cleanups to the rt5645 mc driver's codec_name handling: 1. In the for loop looking for the dai_index for the codec, replace card->dai_link[i] with cht_dailink[i]. The for loop already uses ARRAY_SIZE(cht_dailink) as bound and card->dai_link is just a pointer to cht_dailink using card->dai_link only obfuscates that cht_dailink is being modified directly rather then say a copy of cht_dailink. Using cht_dailink[i] also makes the code consistent with other machine drivers. 2. Don't set cht_dailink[dai_index].codecs->name in the for loop, this immediately gets overridden using acpi_dev_name(adev) directly below the loop. 3. Add a missing break to the loop. 4. Remove the now no longer used (only set, never read) codec_name field from struct cht_mc_private. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240210134400.24913-3-hdegoede@redhat.com Signed-off-by: Mark Brown <broonie@kernel.org>
The driver uses several symbols declared in <linux/platform_device.h>, e.g module_platform_driver(). Include this header explicitly now that <linux/of_platform.h> doesn't include <linux/platform_device.h> any more. Fixes: ef175b2 ("of: Stop circularly including of_device.h and of_platform.h") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20240210164006.208149-6-u.kleine-koenig@pengutronix.de Signed-off-by: Mark Brown <broonie@kernel.org>
I failed to adapt this driver because it's not enabled in a powerpc allmodconfig build and also wasn't hit by my grep expertise. Fix accordingly. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402100815.XQXw9XCF-lkp@intel.com/ Fixes: 2259233 ("spi: bitbang: Follow renaming of SPI "master" to "controller"") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20240210164006.208149-7-u.kleine-koenig@pengutronix.de Signed-off-by: Mark Brown <broonie@kernel.org>
Since commit 24778be ("spi: convert drivers to use bits_per_word_mask") the bits_per_word variable is only written to. The check that was there before isn't needed any more as the spi core ensures that only 8 bit transfers are used, so the variable can go away together with all assignments to it. Fixes: 24778be ("spi: convert drivers to use bits_per_word_mask") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20240210164006.208149-8-u.kleine-koenig@pengutronix.de Signed-off-by: Mark Brown <broonie@kernel.org>
There is a path in rt5645_jack_detect_work(), where rt5645->jd_mutex is left locked forever. That may lead to deadlock when rt5645_jack_detect_work() is called for the second time. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: cdba430 ("ASoC: rt5650: add mutex to avoid the jack detection failure") Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Link: https://lore.kernel.org/r/1707645514-21196-1-git-send-email-khoroshilov@ispras.ru Signed-off-by: Mark Brown <broonie@kernel.org>
A recent change in acp_irq_thread() was meant to address a potential race condition while trying to acquire the hardware semaphore responsible for the synchronization between firmware and host IPC interrupts. This resulted in an improper use of the IPC spinlock, causing normal kernel memory allocations (which may sleep) inside atomic contexts: 1707255557.133976 kernel: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:315 ... 1707255557.134757 kernel: sof_ipc3_rx_msg+0x70/0x130 [snd_sof] 1707255557.134793 kernel: acp_sof_ipc_irq_thread+0x1e0/0x550 [snd_sof_amd_acp] 1707255557.134855 kernel: acp_irq_thread+0xa3/0x130 [snd_sof_amd_acp] 1707255557.134904 kernel: ? irq_thread+0xb5/0x1e0 1707255557.134947 kernel: ? __pfx_irq_thread_fn+0x10/0x10 1707255557.134985 kernel: irq_thread_fn+0x23/0x60 Moreover, there are attempts to lock a mutex from the same atomic context: 1707255557.136357 kernel: ============================= 1707255557.136393 kernel: [ BUG: Invalid wait context ] 1707255557.136413 kernel: 6.8.0-rc3-next-20240206-audio-next #9 Tainted: G W 1707255557.136432 kernel: ----------------------------- 1707255557.136451 kernel: irq/66-AudioDSP/502 is trying to lock: 1707255557.136470 kernel: ffff965152f26af8 (&sb->s_type->i_mutex_key#2){+.+.}-{3:3}, at: start_creating.part.0+0x5f/0x180 ... 1707255557.137429 kernel: start_creating.part.0+0x5f/0x180 1707255557.137457 kernel: __debugfs_create_file+0x61/0x210 1707255557.137475 kernel: snd_sof_debugfs_io_item+0x75/0xc0 [snd_sof] 1707255557.137494 kernel: sof_ipc3_do_rx_work+0x7cf/0x9f0 [snd_sof] 1707255557.137513 kernel: sof_ipc3_rx_msg+0xb3/0x130 [snd_sof] 1707255557.137532 kernel: acp_sof_ipc_irq_thread+0x1e0/0x550 [snd_sof_amd_acp] 1707255557.137551 kernel: acp_irq_thread+0xa3/0x130 [snd_sof_amd_acp] Fix the issues by reducing the lock scope in acp_irq_thread(), so that it guards only the hardware semaphore acquiring attempt. Additionally, restore the initial locking in acp_sof_ipc_irq_thread() to synchronize the handling of immediate replies from DSP core. Fixes: 802134c ("ASoC: SOF: amd: Refactor spinlock_irq(&sdev->ipc_lock) sequence in irq_handler") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20240208234315.2182048-1-cristian.ciocaltea@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
…inux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Correct the minimum CPU family for Transmeta Crusoe in Kconfig so that such hw can boot again - Do not take into accout XSTATE buffer size info supplied by userspace when constructing a sigreturn frame - Switch get_/put_user* to EX_TYPE_UACCESS exception handling when an MCE is encountered so that it can be properly recovered from instead of simply panicking * tag 'x86_urgent_for_v6.8_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6 x86/fpu: Stop relying on userspace for info to fault in xsave buffer x86/lib: Revert to _ASM_EXTABLE_UA() for {get,put}_user() fixups
…m/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Make sure a warning is issued when a hrtimer gets queued after the timers have been migrated on the CPU down path and thus said timer will get ignored * tag 'timers_urgent_for_v6.8_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Report offline hrtimer enqueue
In various performance profiles of kernels with BPF programs attached, bpf_local_storage_lookup() appears as a significant portion of CPU cycles spent. To enable the compiler generate more optimal code, turn bpf_local_storage_lookup() into a static inline function, where only the cache insertion code path is outlined Notably, outlining cache insertion helps avoid bloating callers by duplicating setting up calls to raw_spin_{lock,unlock}_irqsave() (on architectures which do not inline spin_lock/unlock, such as x86), which would cause the compiler produce worse code by deciding to outline otherwise inlinable functions. The call overhead is neutral, because we make 2 calls either way: either calling raw_spin_lock_irqsave() and raw_spin_unlock_irqsave(); or call __bpf_local_storage_insert_cache(), which calls raw_spin_lock_irqsave(), followed by a tail-call to raw_spin_unlock_irqsave() where the compiler can perform TCO and (in optimized uninstrumented builds) turns it into a plain jump. The call to __bpf_local_storage_insert_cache() can be elided entirely if cacheit_lockit is a false constant expression. Based on results from './benchs/run_bench_local_storage.sh' (21 trials, reboot between each trial; x86 defconfig + BPF, clang 16) this produces improvements in throughput and latency in the majority of cases, with an average (geomean) improvement of 8%: +---- Hashmap Control -------------------- | | + num keys: 10 | : <before> | <after> | +-+ hashmap (control) sequential get +----------------------+---------------------- | +- hits throughput | 14.789 M ops/s | 14.745 M ops/s ( ~ ) | +- hits latency | 67.679 ns/op | 67.879 ns/op ( ~ ) | +- important_hits throughput | 14.789 M ops/s | 14.745 M ops/s ( ~ ) | | + num keys: 1000 | : <before> | <after> | +-+ hashmap (control) sequential get +----------------------+---------------------- | +- hits throughput | 12.233 M ops/s | 12.170 M ops/s ( ~ ) | +- hits latency | 81.754 ns/op | 82.185 ns/op ( ~ ) | +- important_hits throughput | 12.233 M ops/s | 12.170 M ops/s ( ~ ) | | + num keys: 10000 | : <before> | <after> | +-+ hashmap (control) sequential get +----------------------+---------------------- | +- hits throughput | 7.220 M ops/s | 7.204 M ops/s ( ~ ) | +- hits latency | 138.522 ns/op | 138.842 ns/op ( ~ ) | +- important_hits throughput | 7.220 M ops/s | 7.204 M ops/s ( ~ ) | | + num keys: 100000 | : <before> | <after> | +-+ hashmap (control) sequential get +----------------------+---------------------- | +- hits throughput | 5.061 M ops/s | 5.165 M ops/s (+2.1%) | +- hits latency | 198.483 ns/op | 194.270 ns/op (-2.1%) | +- important_hits throughput | 5.061 M ops/s | 5.165 M ops/s (+2.1%) | | + num keys: 4194304 | : <before> | <after> | +-+ hashmap (control) sequential get +----------------------+---------------------- | +- hits throughput | 2.864 M ops/s | 2.882 M ops/s ( ~ ) | +- hits latency | 365.220 ns/op | 361.418 ns/op (-1.0%) | +- important_hits throughput | 2.864 M ops/s | 2.882 M ops/s ( ~ ) | +---- Local Storage ---------------------- | | + num_maps: 1 | : <before> | <after> | +-+ local_storage cache sequential get +----------------------+---------------------- | +- hits throughput | 33.005 M ops/s | 39.068 M ops/s (+18.4%) | +- hits latency | 30.300 ns/op | 25.598 ns/op (-15.5%) | +- important_hits throughput | 33.005 M ops/s | 39.068 M ops/s (+18.4%) | : | : <before> | <after> | +-+ local_storage cache interleaved get +----------------------+---------------------- | +- hits throughput | 37.151 M ops/s | 44.926 M ops/s (+20.9%) | +- hits latency | 26.919 ns/op | 22.259 ns/op (-17.3%) | +- important_hits throughput | 37.151 M ops/s | 44.926 M ops/s (+20.9%) | | + num_maps: 10 | : <before> | <after> | +-+ local_storage cache sequential get +----------------------+---------------------- | +- hits throughput | 32.288 M ops/s | 38.099 M ops/s (+18.0%) | +- hits latency | 30.972 ns/op | 26.248 ns/op (-15.3%) | +- important_hits throughput | 3.229 M ops/s | 3.810 M ops/s (+18.0%) | : | : <before> | <after> | +-+ local_storage cache interleaved get +----------------------+---------------------- | +- hits throughput | 34.473 M ops/s | 41.145 M ops/s (+19.4%) | +- hits latency | 29.010 ns/op | 24.307 ns/op (-16.2%) | +- important_hits throughput | 12.312 M ops/s | 14.695 M ops/s (+19.4%) | | + num_maps: 16 | : <before> | <after> | +-+ local_storage cache sequential get +----------------------+---------------------- | +- hits throughput | 32.524 M ops/s | 38.341 M ops/s (+17.9%) | +- hits latency | 30.748 ns/op | 26.083 ns/op (-15.2%) | +- important_hits throughput | 2.033 M ops/s | 2.396 M ops/s (+17.9%) | : | : <before> | <after> | +-+ local_storage cache interleaved get +----------------------+---------------------- | +- hits throughput | 34.575 M ops/s | 41.338 M ops/s (+19.6%) | +- hits latency | 28.925 ns/op | 24.193 ns/op (-16.4%) | +- important_hits throughput | 11.001 M ops/s | 13.153 M ops/s (+19.6%) | | + num_maps: 17 | : <before> | <after> | +-+ local_storage cache sequential get +----------------------+---------------------- | +- hits throughput | 28.861 M ops/s | 32.756 M ops/s (+13.5%) | +- hits latency | 34.649 ns/op | 30.530 ns/op (-11.9%) | +- important_hits throughput | 1.700 M ops/s | 1.929 M ops/s (+13.5%) | : | : <before> | <after> | +-+ local_storage cache interleaved get +----------------------+---------------------- | +- hits throughput | 31.529 M ops/s | 36.110 M ops/s (+14.5%) | +- hits latency | 31.719 ns/op | 27.697 ns/op (-12.7%) | +- important_hits throughput | 9.598 M ops/s | 10.993 M ops/s (+14.5%) | | + num_maps: 24 | : <before> | <after> | +-+ local_storage cache sequential get +----------------------+---------------------- | +- hits throughput | 18.602 M ops/s | 19.937 M ops/s (+7.2%) | +- hits latency | 53.767 ns/op | 50.166 ns/op (-6.7%) | +- important_hits throughput | 0.776 M ops/s | 0.831 M ops/s (+7.2%) | : | : <before> | <after> | +-+ local_storage cache interleaved get +----------------------+---------------------- | +- hits throughput | 21.718 M ops/s | 23.332 M ops/s (+7.4%) | +- hits latency | 46.047 ns/op | 42.865 ns/op (-6.9%) | +- important_hits throughput | 6.110 M ops/s | 6.564 M ops/s (+7.4%) | | + num_maps: 32 | : <before> | <after> | +-+ local_storage cache sequential get +----------------------+---------------------- | +- hits throughput | 14.118 M ops/s | 14.626 M ops/s (+3.6%) | +- hits latency | 70.856 ns/op | 68.381 ns/op (-3.5%) | +- important_hits throughput | 0.442 M ops/s | 0.458 M ops/s (+3.6%) | : | : <before> | <after> | +-+ local_storage cache interleaved get +----------------------+---------------------- | +- hits throughput | 17.111 M ops/s | 17.906 M ops/s (+4.6%) | +- hits latency | 58.451 ns/op | 55.865 ns/op (-4.4%) | +- important_hits throughput | 4.776 M ops/s | 4.998 M ops/s (+4.6%) | | + num_maps: 100 | : <before> | <after> | +-+ local_storage cache sequential get +----------------------+---------------------- | +- hits throughput | 5.281 M ops/s | 5.528 M ops/s (+4.7%) | +- hits latency | 192.398 ns/op | 183.059 ns/op (-4.9%) | +- important_hits throughput | 0.053 M ops/s | 0.055 M ops/s (+4.9%) | : | : <before> | <after> | +-+ local_storage cache interleaved get +----------------------+---------------------- | +- hits throughput | 6.265 M ops/s | 6.498 M ops/s (+3.7%) | +- hits latency | 161.436 ns/op | 152.877 ns/op (-5.3%) | +- important_hits throughput | 1.636 M ops/s | 1.697 M ops/s (+3.7%) | | + num_maps: 1000 | : <before> | <after> | +-+ local_storage cache sequential get +----------------------+---------------------- | +- hits throughput | 0.355 M ops/s | 0.354 M ops/s ( ~ ) | +- hits latency | 2826.538 ns/op | 2827.139 ns/op ( ~ ) | +- important_hits throughput | 0.000 M ops/s | 0.000 M ops/s ( ~ ) | : | : <before> | <after> | +-+ local_storage cache interleaved get +----------------------+---------------------- | +- hits throughput | 0.404 M ops/s | 0.403 M ops/s ( ~ ) | +- hits latency | 2481.190 ns/op | 2487.555 ns/op ( ~ ) | +- important_hits throughput | 0.102 M ops/s | 0.101 M ops/s ( ~ ) The on_lookup test in {cgrp,task}_ls_recursion.c is removed because the bpf_local_storage_lookup is no longer traceable and adding tracepoint will make the compiler generate worse code: https://lore.kernel.org/bpf/ZcJmok64Xqv6l4ZS@elver.google.com/ Signed-off-by: Marco Elver <elver@google.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240207122626.3508658-1-elver@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Merge series from Hans de Goede <hdegoede@redhat.com>: While testing 6.8 on a Bay Trail device with a ALC5640 codec I noticed a regression in 6.8 which causes a NULL pointer deref in probe(). All BYT/CHT Intel machine drivers are affected. Patch 1/2 of this series fixes all of them. Patch 2/2 adds some small cleanups to cht_bsw_rt5645.c for issues which I noticed while working on 1/2.
[Why] New worst-case measurement observed at 1897us. [How] Increase to 2000us to cover the new worst case + margin. Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In certain cooperative group dispatch scenarios the default SPI resource allocation may cause reduced per-CU workgroup occupancy. Set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang scenarions. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Suggested-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Gfx11 debug flags mask is currently set with an implicit assumption that no other mqd update flags exist. This needs to be fixed with newly introduced flag UPDATE_FLAG_IS_GWS by the previous patch. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
…nux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Fixes and simple cleanups: - use a proper flexible array instead of a one-element array in order to avoid array-bounds sanitizer errors - add NULL pointer checks after allocating memory - use memdup_array_user() instead of open-coding it - fix a rare race condition in Xen event channel allocation code - make struct bus_type instances const - make kerneldoc inline comments match reality" * tag 'for-linus-6.8a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/events: close evtchn after mapping cleanup xen/gntalloc: Replace UAPI 1-element array xen: balloon: make balloon_subsys const xen: pcpu: make xen_pcpu_subsys const xen/privcmd: Use memdup_array_user() in alloc_ioreq() x86/xen: Add some null pointer checking to smp.c xen/xenbus: document will_handle argument for xenbus_watch_path()
…git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from can, wireless and netfilter. Current release - regressions: - af_unix: fix task hung while purging oob_skb in GC - pds_core: do not try to run health-thread in VF path Current release - new code bugs: - sched: act_mirred: don't zero blockid when net device is being deleted Previous releases - regressions: - netfilter: - nat: restore default DNAT behavior - nf_tables: fix bidirectional offload, broken when unidirectional offload support was added - openvswitch: limit the number of recursions from action sets - eth: i40e: do not allow untrusted VF to remove administratively set MAC address Previous releases - always broken: - tls: fix races and bugs in use of async crypto - mptcp: prevent data races on some of the main socket fields, fix races in fastopen handling - dpll: fix possible deadlock during netlink dump operation - dsa: lan966x: fix crash when adding interface under a lag when some of the ports are disabled - can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Misc: - a handful of fixes and reliability improvements for selftests - fix sysfs documentation missing net/ in paths - finish the work of squashing the missing MODULE_DESCRIPTION() warnings in networking" * tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits) net: fill in MODULE_DESCRIPTION()s for missing arcnet net: fill in MODULE_DESCRIPTION()s for mdio_devres net: fill in MODULE_DESCRIPTION()s for ppp net: fill in MODULE_DESCRIPTION()s for fddik/skfp net: fill in MODULE_DESCRIPTION()s for plip net: fill in MODULE_DESCRIPTION()s for ieee802154/fakelb net: fill in MODULE_DESCRIPTION()s for xen-netback net: ravb: Count packets instead of descriptors in GbEth RX path pppoe: Fix memory leak in pppoe_sendmsg() net: sctp: fix skb leak in sctp_inq_free() net: bcmasp: Handle RX buffer allocation failure net-timestamp: make sk_tskey more predictable in error path selftests: tls: increase the wait in poll_partial_rec_async ice: Add check for lport extraction to LAG init netfilter: nf_tables: fix bidirectional offload regression netfilter: nat: restore default DNAT behavior netfilter: nft_set_pipapo: fix missing : in kdoc igc: Remove temporary workaround igb: Fix string truncation warnings in igb_set_fw_version can: netlink: Fix TDCO calculation using the old data bittiming ...
Verifier log avoids printing the same source code line multiple times when a consecutive block of BPF assembly instructions are covered by the same original (C) source code line. This greatly improves verifier log legibility. Unfortunately, this check is imperfect and in production applications it quite often happens that verifier log will have multiple duplicated source lines emitted, for no apparently good reason. E.g., this is excerpt from a real-world BPF application (with register states omitted for clarity): BEFORE ====== ; for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { @ strobemeta_probe.bpf.c:394 5369: (07) r8 += 2 ; 5370: (07) r7 += 16 ; ; for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { @ strobemeta_probe.bpf.c:394 5371: (07) r9 += 1 ; 5372: (79) r4 = *(u64 *)(r10 -32) ; ; for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { @ strobemeta_probe.bpf.c:394 5373: (55) if r9 != 0xf goto pc+2 ; if (i >= map->cnt) @ strobemeta_probe.bpf.c:396 5376: (79) r1 = *(u64 *)(r10 -40) ; 5377: (79) r1 = *(u64 *)(r1 +8) ; ; if (i >= map->cnt) @ strobemeta_probe.bpf.c:396 5378: (dd) if r1 s<= r9 goto pc-5 ; ; descr->key_lens[i] = 0; @ strobemeta_probe.bpf.c:398 5379: (b4) w1 = 0 ; 5380: (6b) *(u16 *)(r8 -30) = r1 ; ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); @ strobemeta_probe.bpf.c:400 5381: (79) r3 = *(u64 *)(r7 -8) ; 5382: (7b) *(u64 *)(r10 -24) = r6 ; ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); @ strobemeta_probe.bpf.c:400 5383: (bc) w6 = w6 ; ; barrier_var(payload_off); @ strobemeta_probe.bpf.c:280 5384: (bf) r2 = r6 ; 5385: (bf) r1 = r4 ; As can be seen, line 394 is emitted thrice, 396 is emitted twice, and line 400 is duplicated as well. Note that there are no intermingling other lines of source code in between these duplicates, so the issue is not compiler reordering assembly instruction such that multiple original source code lines are in effect. It becomes more obvious what's going on if we look at *full* original line info information (using btfdump for this, [0]): #2764: line: insn #5363 --> 394:3 @ ./././strobemeta_probe.bpf.c for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { #2765: line: insn #5373 --> 394:21 @ ./././strobemeta_probe.bpf.c for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { #2766: line: insn #5375 --> 394:47 @ ./././strobemeta_probe.bpf.c for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { #2767: line: insn #5377 --> 394:3 @ ./././strobemeta_probe.bpf.c for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { #2768: line: insn #5378 --> 414:10 @ ./././strobemeta_probe.bpf.c return off; We can see that there are four line info records covering instructions #5363 through #5377 (instruction indices are shifted due to subprog instruction being appended to main program), all of them are pointing to the same C source code line #394. But each of them points to a different part of that line, which is denoted by differing column numbers (3, 21, 47, 3). But verifier log doesn't distinguish between parts of the same source code line and doesn't emit this column number information, so for end user it's just a repetitive visual noise. So let's improve the detection of repeated source code line and avoid this. With the changes in this patch, we get this output for the same piece of BPF program log: AFTER ===== ; for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { @ strobemeta_probe.bpf.c:394 5369: (07) r8 += 2 ; 5370: (07) r7 += 16 ; 5371: (07) r9 += 1 ; 5372: (79) r4 = *(u64 *)(r10 -32) ; 5373: (55) if r9 != 0xf goto pc+2 ; if (i >= map->cnt) @ strobemeta_probe.bpf.c:396 5376: (79) r1 = *(u64 *)(r10 -40) ; 5377: (79) r1 = *(u64 *)(r1 +8) ; 5378: (dd) if r1 s<= r9 goto pc-5 ; ; descr->key_lens[i] = 0; @ strobemeta_probe.bpf.c:398 5379: (b4) w1 = 0 ; 5380: (6b) *(u16 *)(r8 -30) = r1 ; ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); @ strobemeta_probe.bpf.c:400 5381: (79) r3 = *(u64 *)(r7 -8) ; 5382: (7b) *(u64 *)(r10 -24) = r6 ; 5383: (bc) w6 = w6 ; ; barrier_var(payload_off); @ strobemeta_probe.bpf.c:280 5384: (bf) r2 = r6 ; 5385: (bf) r1 = r4 ; All the duplication is gone and the log is cleaner and less distracting. [0] https://github.com/anakryiko/btfdump Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240214174100.2847419-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
With latest llvm19, I hit the following selftest failures with $ ./test_progs -j libbpf: prog 'on_event': BPF program load failed: Permission denied libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- combined stack size of 4 calls is 544. Too large verification time 1344153 usec stack depth 24+440+0+32 processed 51008 insns (limit 1000000) max_states_per_insn 19 total_states 1467 peak_states 303 mark_read 146 -- END PROG LOAD LOG -- libbpf: prog 'on_event': failed to load: -13 libbpf: failed to load object 'strobemeta_subprogs.bpf.o' scale_test:FAIL:expect_success unexpected error: -13 (errno 13) #498 verif_scale_strobemeta_subprogs:FAIL The verifier complains too big of the combined stack size (544 bytes) which exceeds the maximum stack limit 512. This is a regression from llvm19 ([1]). In the above error log, the original stack depth is 24+440+0+32. To satisfy interpreter's need, in verifier the stack depth is adjusted to 32+448+32+32=544 which exceeds 512, hence the error. The same adjusted stack size is also used for jit case. But the jitted codes could use smaller stack size. $ egrep -r stack_depth | grep round_up arm64/net/bpf_jit_comp.c: ctx->stack_size = round_up(prog->aux->stack_depth, 16); loongarch/net/bpf_jit.c: bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); powerpc/net/bpf_jit_comp.c: cgctx.stack_size = round_up(fp->aux->stack_depth, 16); riscv/net/bpf_jit_comp32.c: round_up(ctx->prog->aux->stack_depth, STACK_ALIGN); riscv/net/bpf_jit_comp64.c: bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); s390/net/bpf_jit_comp.c: u32 stack_depth = round_up(fp->aux->stack_depth, 8); sparc/net/bpf_jit_comp_64.c: stack_needed += round_up(stack_depth, 16); x86/net/bpf_jit_comp.c: EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8)); x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8); x86/net/bpf_jit_comp.c: round_up(stack_depth, 8)); x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8); x86/net/bpf_jit_comp.c: EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8)); In the above, STACK_ALIGN in riscv/net/bpf_jit_comp32.c is defined as 16. So stack is aligned in either 8 or 16, x86/s390 having 8-byte stack alignment and the rest having 16-byte alignment. This patch calculates total stack depth based on 16-byte alignment if jit is requested. For the above failing case, the new stack size will be 32+448+0+32=512 and no verification failure. llvm19 regression will be discussed separately in llvm upstream. The verifier change caused three test failures as these tests compared messages with stack size. More specifically, - test_global_funcs/global_func1: fail with interpreter mode and success with jit mode. Adjusted stack sizes so both jit and interpreter modes will fail. - async_stack_depth/{pseudo_call_check, async_call_root_check}: since jit and interpreter will calculate different stack sizes, the failure msg is adjusted to omit those specific stack size numbers. [1] https://lore.kernel.org/bpf/32bde0f0-1881-46c9-931a-673be566c61d@linux.dev/ Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240214232951.4113094-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Write error handling is racy and can sometime lead to the error recovery path wrongly changing the inode size of a sequential zone file to an incorrect value which results in garbage data being readable at the end of a file. There are 2 problems: 1) zonefs_file_dio_write() updates a zone file write pointer offset after issuing a direct IO with iomap_dio_rw(). This update is done only if the IO succeed for synchronous direct writes. However, for asynchronous direct writes, the update is done without waiting for the IO completion so that the next asynchronous IO can be immediately issued. However, if an asynchronous IO completes with a failure right before the i_truncate_mutex lock protecting the update, the update may change the value of the inode write pointer offset that was corrected by the error path (zonefs_io_error() function). 2) zonefs_io_error() is called when a read or write error occurs. This function executes a report zone operation using the callback function zonefs_io_error_cb(), which does all the error recovery handling based on the current zone condition, write pointer position and according to the mount options being used. However, depending on the zoned device being used, a report zone callback may be executed in a context that is different from the context of __zonefs_io_error(). As a result, zonefs_io_error_cb() may be executed without the inode truncate mutex lock held, which can lead to invalid error processing. Fix both problems as follows: - Problem 1: Perform the inode write pointer offset update before a direct write is issued with iomap_dio_rw(). This is safe to do as partial direct writes are not supported (IOMAP_DIO_PARTIAL is not set) and any failed IO will trigger the execution of zonefs_io_error() which will correct the inode write pointer offset to reflect the current state of the one on the device. - Problem 2: Change zonefs_io_error_cb() into zonefs_handle_io_error() and call this function directly from __zonefs_io_error() after obtaining the zone information using blkdev_report_zones() with a simple callback function that copies to a local stack variable the struct blk_zone obtained from the device. This ensures that error handling is performed holding the inode truncate mutex. This change also simplifies error handling for conventional zone files by bypassing the execution of report zones entirely. This is safe to do because the condition of conventional zones cannot be read-only or offline and conventional zone files are always fully mapped with a constant file size. Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 8dcc1a9 ("fs: New zonefs file system") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
…g/drm/drm-misc into drm-fixes A suspend/resume error fix for ivpu, a couple of scheduler fixes for nouveau, a patch to support large page arrays in prime, a uninitialized variable fix in crtc, a locking fix in rockchip/vop2 and a buddy allocator error reporting fix. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/b4ffqzigtfh6cgzdpwuk6jlrv3dnk4hu6etiizgvibysqgtl2p@42n2gdfdd5eu
…rg/drm/drm-intel into drm-fixes Fix for #10172: Blank screen on JSL Chromebooks. Stable fix to limit DP SST link rate to <=8.1Gbps. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zc37W27F5OvoeSkG@jlahtine-mobl.ger.corp.intel.com
…/drm/xe/kernel into drm-fixes Driver Changes: - Fix an out-of-bounds shift. - Fix the display code thinking xe uses shmem - Fix a warning about index out-of-bound - Fix a clang-16 compilation warning Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zc4GpcrbFVqdK9Ws@fedora
…top.org/agd5f/linux into drm-fixes amd-drm-fixes-6.8-2024-02-15-2: amdgpu: - PSR fixes - Suspend/resume fixes - Link training fix - Aspect ratio fix - DCN 3.5 fixes - VCN 4.x fix - GFX 11 fix - Misc display fixes - Misc small fixes amdkfd: - Cache size reporting fix - SIMD distribution fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240215192452.11805-1-alexander.deucher@amd.com
…g/drm/msm into drm-fixes Fixes for v6.8-rc5 GPU: - dmabuf vmap fix - a610 UBWC corruption fix (incorrect hbb) - revert a commit that was making GPU recovery unreliable - tlb invalidation fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGszDSiw66+a=ttBr-hat+zrcBtfc_cZ4LQqXu89DJ0UeQ@mail.gmail.com
…nel/git/pcmoore/lsm Pull lsm fix from Paul Moore: "One small LSM patch to fix a potential integer overflow in the newly added lsm_set_self_attr() syscall" * tag 'lsm-pr-20240215' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: fix integer overflow in lsm_set_self_attr() syscall
…/drm Pull drm fixes from Dave Airlie: "Regular weekly fixes, nothing too major, mostly amdgpu, then i915, xe, msm and nouveau with some scattered bits elsewhere. crtc: - fix uninit variable prime: - support > 4GB page arrays buddy: - fix error handling in allocations i915: - fix blankscreen on JSL chromebooks - stable fix to limit DP sst link rates xe: - Fix an out-of-bounds shift. - Fix the display code thinking xe uses shmem - Fix a warning about index out-of-bound - Fix a clang-16 compilation warning amdgpu: - PSR fixes - Suspend/resume fixes - Link training fix - Aspect ratio fix - DCN 3.5 fixes - VCN 4.x fix - GFX 11 fix - Misc display fixes - Misc small fixes amdkfd: - Cache size reporting fix - SIMD distribution fix msm: - GPU: - dmabuf vmap fix - a610 UBWC corruption fix (incorrect hbb) - revert a commit that was making GPU recovery unreliable - tlb invalidation fix ivpu: - suspend/resume fix nouveau: - fix scheduler cleanup path - fix pointless scheduler creation - fix kvalloc argument order rockchip: - vop2 locking fix" * tag 'drm-fixes-2024-02-16' of git://anongit.freedesktop.org/drm/drm: (38 commits) drm/amdgpu: Fix implicit assumtion in gfx11 debug flags drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards drm/amd/display: Increase ips2_eval delay for DCN35 drm/amdgpu/display: Initialize gamma correction mode variable in dcn30_get_gamcor_current() drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution drm/amd/display: fixed integer types and null check locations drm/amd/display: Fix array-index-out-of-bounds in dcn35_clkmgr drm/amd/display: Preserve original aspect ratio in create stream drm/amd/display: Fix possible NULL dereference on device remove/driver unload Revert "drm/amd/display: increased min_dcfclk_mhz and min_fclk_mhz" drm/amd/display: Add align done check Revert "drm/amd: flush any delayed gfxoff on suspend entry" drm/amd: Stop evicting resources on APUs in suspend drm/amd/display: Fix possible buffer overflow in 'find_dcfclk_for_voltage()' drm/amd/display: Fix possible use of uninitialized 'max_chunks_fbc_mode' in 'calculate_bandwidth()' drm/amd/display: Initialize 'wait_time_microsec' variable in link_dp_training_dpia.c drm/amd/display: Fix && vs || typos drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3 drm/amdgpu: make damage clips support configurable drm/msm: Wire up tlb ops ...
…inux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - add missing stubs for functions that are not built with GPIOLIB disabled * tag 'gpio-fixes-for-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: add gpio_device_get_label() stub for !GPIOLIB gpiolib: add gpio_device_get_base() stub for !GPIOLIB gpiolib: add gpiod_to_gpio_device() stub for !GPIOLIB
…l/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of device-specific fixes. It became a bit bigger than wished, but all look reasonably small and safe to apply. - A few Cirrus Logic CS35L56 and CS42L43 driver fixes - ASoC SOF fixes and workarounds - Various ASoC Intel fixes - Lots of HD-, USB-audio and AMD ACP quirks" * tag 'sound-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (33 commits) ALSA: usb-audio: More relaxed check of MIDI jack names ALSA: hda/realtek: fix mute/micmute LED For HP mt645 ALSA: hda/realtek: cs35l41: Fix order and duplicates in quirks table ALSA: hda/realtek: cs35l41: Fix device ID / model name ALSA: hda/realtek: cs35l41: Add internal speaker support for ASUS UM3402 with missing DSD ASoC: cs35l56: Workaround for ACPI with broken spk-id-gpios property ALSA: hda: Add Lenovo Legion 7i gen7 sound quirk ASoC: SOF: IPC3: fix message bounds on ipc ops ASoC: SOF: ipc4-pcm: Workaround for crashed firmware on system suspend ASoC: q6dsp: fix event handler prototype ASoC: SOF: Intel: pci-lnl: Change the topology path to intel/sof-ipc4-tplg ASoC: SOF: Intel: pci-tgl: Change the default paths and firmware names ASoC: amd: yc: Fix non-functional mic on Lenovo 82UU ASoC: rt5645: Add DMI quirk for inverted jack-detect on MeeGoPad T8 ASoC: rt5645: Make LattePanda board DMI match more precise ASoC: SOF: amd: Fix locking in ACP IRQ handler ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work() ASoC: Intel: cht_bsw_rt5645: Cleanup codec_name handling ASoC: Intel: Boards: Fix NULL pointer deref in BYT/CHT boards ASoC: cs35l56: Remove default from IRQ1_CFG register ...
…kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.8, take #1 - Don't source the VFIO Kconfig twice - Fix protected-mode locking order between kvm and vcpus
…kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.8, take #2 - Avoid dropping the page refcount twice when freeing an unlinked page-table subtree.
Commit f04a32b ("selftests/bpf: Do not use sign-file as testcase") removed the TEST_CUSTOM_PROGS assignment, and removed it from being used on TEST_GEN_FILES. Remove two leftovers from that cleanup. Found by inspection. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexey Gladkov <legion@kernel.org> Link: https://lore.kernel.org/bpf/20240216-bpf-selftests-custom-progs-v1-1-f7cf281a1fda@suse.com
…el/git/dlemoal/zonefs Pull zonefs fix from Damien Le Moal: - Fix direct write error handling to avoid a race between failed IO completion and the submission path itself which can result in an invalid file size exposed to the user after the failed IO. * tag 'zonefs-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: Improve error handling
…git/arm64/linux Pull arm64 fixes from Will Deacon: "It's a little busier than normal, but it's still not a lot of code and things seem fairly quiet in general: - Fix allocation failure during SVE coredumps - Fix handling of SVE context on signal delivery - Enable Neoverse N2 CPU errata workarounds for Microsoft's "Azure Cobalt 100" clone - Work around CMN PMU erratum in AmpereOneX implementation - Fix typo in CXL PMU event definition - Fix jump label asm constraints" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/sve: Lower the maximum allocation for the SVE ptrace regset arm64: Subscribe Microsoft Azure Cobalt 100 to ARM Neoverse N2 errata perf/arm-cmn: Workaround AmpereOneX errata AC04_MESH_1 (incorrect child count) arm64: jump_label: use constraints "Si" instead of "i" arm64: fix typo in comments perf: CXL: fix mismatched cpmu event opcode arm64/signal: Don't assume that TIF_SVE means we saved SVE state
…el/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix the #ifndef that didn't have the 'CONFIG_' prefix on HAVE_DYNAMIC_FTRACE_WITH_REGS The fix to have dynamic trampolines work with x86 broke arm64 as the config used in the #ifdef was HAVE_DYNAMIC_FTRACE_WITH_REGS and not CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which removed the fix that the previous fix was to fix. - Fix tracing_on state The code to test if "tracing_on" is set incorrectly used ring_buffer_record_is_on() which returns false if the ring buffer isn't able to be written to. But the ring buffer disable has several bits that disable it. One is internal disabling which is used for resizing and other modifications of the ring buffer. But the "tracing_on" user space visible flag should only report if tracing is actually on and not internally disabled, as this can cause confusion as writing "1" when it is disabled will not enable it. Instead use ring_buffer_record_is_set_on() which shows the user space visible settings. - Fix a false positive kmemleak on saved cmdlines Now that the saved_cmdlines structure is allocated via alloc_page() and not via kmalloc() it has become invisible to kmemleak. The allocation done to one of its pointers was flagged as a dangling allocation leak. Make kmemleak aware of this allocation and free. - Fix synthetic event dynamic strings An update that cleaned up the synthetic event code removed the return value of trace_string(), and had it return zero instead of the length, causing dynamic strings in the synthetic event to always have zero size. - Clean up documentation and header files for seq_buf * tag 'trace-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: seq_buf: Fix kernel documentation seq_buf: Don't use "proxy" headers tracing/synthetic: Fix trace_string() return value tracing: Inform kmemleak of saved_cmdlines allocation tracing: Use ring_buffer_record_is_set_on() in tracer_tracing_is_on() tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef
Pull KVM fixes from Paolo Bonzini: "ARM: - Avoid dropping the page refcount twice when freeing an unlinked page-table subtree. - Don't source the VFIO Kconfig twice - Fix protected-mode locking order between kvm and vcpus RISC-V: - Fix steal-time related sparse warnings x86: - Cleanup gtod_is_based_on_tsc() to return "bool" instead of an "int" - Make a KVM_REQ_NMI request while handling KVM_SET_VCPU_EVENTS if and only if the incoming events->nmi.pending is non-zero. If the target vCPU is in the UNITIALIZED state, the spurious request will result in KVM exiting to userspace, which in turn causes QEMU to constantly acquire and release QEMU's global mutex, to the point where the BSP is unable to make forward progress. - Fix a type (u8 versus u64) goof that results in pmu->fixed_ctr_ctrl being incorrectly truncated, and ultimately causes KVM to think a fixed counter has already been disabled (KVM thinks the old value is '0'). - Fix a stack leak in KVM_GET_MSRS where a failed MSR read from userspace that is ultimately ignored due to ignore_msrs=true doesn't zero the output as intended. Selftests cleanups and fixes: - Remove redundant newlines from error messages. - Delete an unused variable in the AMX test (which causes build failures when compiling with -Werror). - Fail instead of skipping tests if open(), e.g. of /dev/kvm, fails with an error code other than ENOENT (a Hyper-V selftest bug resulted in an EMFILE, and the test eventually got skipped). - Fix TSC related bugs in several Hyper-V selftests. - Fix a bug in the dirty ring logging test where a sem_post() could be left pending across multiple runs, resulting in incorrect synchronization between the main thread and the vCPU worker thread. - Relax the dirty log split test's assertions on 4KiB mappings to fix false positives due to the number of mappings for memslot 0 (used for code and data that is NOT being dirty logged) changing, e.g. due to NUMA balancing" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) KVM: arm64: Fix double-free following kvm_pgtable_stage2_free_unlinked() RISC-V: KVM: Use correct restricted types RISC-V: paravirt: Use correct restricted types RISC-V: paravirt: steal_time should be static KVM: selftests: Don't assert on exact number of 4KiB in dirty log split test KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test KVM: x86: Fix KVM_GET_MSRS stack info leak KVM: arm64: Do not source virt/lib/Kconfig twice KVM: x86/pmu: Fix type length error when reading pmu->fixed_ctr_ctrl KVM: x86: Make gtod_is_based_on_tsc() return 'bool' KVM: selftests: Make hyperv_clock require TSC based system clocksource KVM: selftests: Run clocksource dependent tests with hyperv_clocksource_tsc_page too KVM: selftests: Use generic sys_clocksource_is_tsc() in vmx_nested_tsc_scaling_test KVM: selftests: Generalize check_clocksource() from kvm_clock_test KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu KVM: arm64: Fix circular locking dependency KVM: selftests: Fail tests when open() fails with !ENOENT KVM: selftests: Avoid infinite loop in hyperv_features when invtsc is missing KVM: selftests: Delete superfluous, unused "stage" variable in AMX test KVM: selftests: x86_64: Remove redundant newlines ...
Syncing to linux#master (v6.8-rc4-331-gc1ca10ceffbb) + bpf/for-next (v6.8-rc1-581-g7648f0c91eaa). Build is still broken.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Sync to linux#master (v6.8-rc4-331-gc1ca10ceffbb) + bpf/for-next (v6.8-rc1-581-g7648f0c91eaa).