forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update 5.4.x+fslc to v5.4.47 #87
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ Upstream commit 79a1f0c ] Socket option IPV6_ADDRFORM supports UDP/UDPLITE and TCP at present. Previously the checking logic looks like: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol != IPPROTO_TCP) break; After commit b6f6118 ("ipv6: restrict IPV6_ADDRFORM operation"), TCP was blocked as the logic changed to: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol == IPPROTO_TCP) do_some_check; break; else break; Then after commit 82c9ae4 ("ipv6: fix restrict IPV6_ADDRFORM operation") UDP/UDPLITE were blocked as the logic changed to: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; if (sk->sk_protocol == IPPROTO_TCP) do_some_check; if (sk->sk_protocol != IPPROTO_TCP) break; Fix it by using Eric's code and simply remove the break in TCP check, which looks like: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol == IPPROTO_TCP) do_some_check; else break; Fixes: 82c9ae4 ("ipv6: fix restrict IPV6_ADDRFORM operation") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…l zones [ Upstream commit 2dc2f76 ] The driver registers three different types of thermal zones: For the ASIC itself, for port modules and for gearboxes. Currently, all three types use the same get_trend() callback which does not work correctly for the ASIC thermal zone. The callback assumes that the device data is of type 'struct mlxsw_thermal_module', whereas for the ASIC thermal zone 'struct mlxsw_thermal' is passed as device data. Fix this by using one get_trend() callback for the ASIC thermal zone and another for the other two types. Fixes: 6f73862 ("mlxsw: core: Add the hottest thermal zone detection") Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e8224bf ] found by smatch: drivers/net/net_failover.c:65 net_failover_open() error: we previously assumed 'primary_dev' could be null (see line 43) Fixes: cfc80d9 ("net: Introduce net_failover driver") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 96aa1b2 ] Tun in IFF_NAPI_FRAGS mode calls napi_gro_frags. Unlike netif_rx and netif_gro_receive, this expects skb->data to point to the mac layer. But skb_probe_transport_header, __skb_get_hash_symmetric, and xdp_do_generic in tun_get_user need skb->data to point to the network header. Flow dissection also needs skb->protocol set, so eth_type_trans has to be called. Ensure the link layer header lies in linear as eth_type_trans pulls ETH_HLEN. Then take the same code paths for frags as for not frags. Push the link layer header back just before calling napi_gro_frags. By pulling up to ETH_HLEN from frag0 into linear, this disables the frag0 optimization in the special case when IFF_NAPI_FRAGS is used with zero length iov[0] (and thus empty skb->linear). Fixes: 90e33d4 ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Petar Penkov <ppenkov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
… options [ Upstream commit 53fc685 ] When neighbor suppression is enabled the bridge device might reply to Neighbor Solicitation (NS) messages on behalf of remote hosts. In case the NS message includes the "Source link-layer address" option [1], the bridge device will use the specified address as the link-layer destination address in its reply. To avoid an infinite loop, break out of the options parsing loop when encountering an option with length zero and disregard the NS message. This is consistent with the IPv6 ndisc code and RFC 4886 which states that "Nodes MUST silently discard an ND packet that contains an option with length zero" [2]. [1] https://tools.ietf.org/html/rfc4861#section-4.3 [2] https://tools.ietf.org/html/rfc4861#section-4.6 Fixes: ed842fa ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alla Segal <allas@mellanox.com> Tested-by: Alla Segal <allas@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…options [ Upstream commit 8066e6b ] When proxy mode is enabled the vxlan device might reply to Neighbor Solicitation (NS) messages on behalf of remote hosts. In case the NS message includes the "Source link-layer address" option [1], the vxlan device will use the specified address as the link-layer destination address in its reply. To avoid an infinite loop, break out of the options parsing loop when encountering an option with length zero and disregard the NS message. This is consistent with the IPv6 ndisc code and RFC 4886 which states that "Nodes MUST silently discard an ND packet that contains an option with length zero" [2]. [1] https://tools.ietf.org/html/rfc4861#section-4.3 [2] https://tools.ietf.org/html/rfc4861#section-4.6 Fixes: 4b29dba ("vxlan: fix nonfunctional neigh_reduce()") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90ceddc upstream. Simplify gen_btf logic to make it work with llvm-objcopy. The existing 'file format' and 'architecture' parsing logic is brittle and does not work with llvm-objcopy/llvm-objdump. 'file format' output of llvm-objdump>=11 will match GNU objdump, but 'architecture' (bfdarch) may not. .BTF in .tmp_vmlinux.btf is non-SHF_ALLOC. Add the SHF_ALLOC flag because it is part of vmlinux image used for introspection. C code can reference the section via linker script defined __start_BTF and __stop_BTF. This fixes a small problem that previous .BTF had the SHF_WRITE flag (objcopy -I binary -O elf* synthesized .data). Additionally, `objcopy -I binary` synthesized symbols _binary__btf_vmlinux_bin_start and _binary__btf_vmlinux_bin_stop (not used elsewhere) are replaced with more commonplace __start_BTF and __stop_BTF. Add 2>/dev/null because GNU objcopy (but not llvm-objcopy) warns "empty loadable segment detected at vaddr=0xffffffff81000000, is this intentional?" We use a dd command to change the e_type field in the ELF header from ET_EXEC to ET_REL so that lld will accept .btf.vmlinux.bin.o. Accepting ET_EXEC as an input file is an extremely rare GNU ld feature that lld does not intend to support, because this is error-prone. The output section description .BTF in include/asm-generic/vmlinux.lds.h avoids potential subtle orphan section placement issues and suppresses --orphan-handling=warn warnings. Fixes: df786c9 ("bpf: Force .BTF section start to zero when dumping from vmlinux") Fixes: cb0cc63 ("powerpc: Include .BTF section") Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Stanislav Fomichev <sdf@google.com> Tested-by: Andrii Nakryiko <andriin@fb.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: ClangBuiltLinux#871 Link: https://lore.kernel.org/bpf/20200318222746.173648-1-maskray@google.com Signed-off-by: Maria Teguiani <teguiani@google.com> Tested-by: Matthias Maennich <maennich@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51da9df upstream. ELFNOTE_START allows callers to specify flags for .pushsection assembler directives. All callsites but ELF_NOTE use "a" for SHF_ALLOC. For vdso's that explicitly use ELF_NOTE_START and BUILD_SALT, the same section is specified twice after preprocessing, once with "a" flag, once without. Example: .pushsection .note.Linux, "a", @note ; .pushsection .note.Linux, "", @note ; While GNU as allows this ordering, it warns for the opposite ordering, making these directives position dependent. We'd prefer not to precisely match this behavior in Clang's integrated assembler. Instead, the non __ASSEMBLY__ definition of ELF_NOTE uses __attribute__((section(".note.Linux"))) which is created with SHF_ALLOC, so let's make the __ASSEMBLY__ definition of ELF_NOTE consistent with C and just always use "a" flag. This allows Clang to assemble a working mainline (5.6) kernel via: $ make CC=clang AS=clang Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Fangrui Song <maskray@google.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: ClangBuiltLinux#913 Link: http://lkml.kernel.org/r/20200325231250.99205-1-ndesaulniers@google.com Debugged-by: Ilie Halip <ilie.halip@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jian Cai <jiancai@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3f8f770 ] MMS345L is another first generation touch screen from Melfas, which uses the same registers as MMS152. However, using I2C_M_NOSTART for it causes errors when reading: i2c i2c-0: sendbytes: NAK bailout. mms114 0-0048: __mms114_read_reg: i2c transfer failed (-5) The driver works fine as soon as I2C_M_NOSTART is removed. Reviewed-by: Andi Shyti <andi@etezian.org> Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Link: https://lore.kernel.org/r/20200405170904.61512-1-stephan@gerhold.net [dtor: removed separate mms345l handling, made everyone use standard transfer mode, propagated the 10bit addressing flag to the read part of the transfer as well.] Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3866f21 ] call_undef_hook() in traps.c applies the same instr_mask for both 16-bit and 32-bit thumb instructions. If instr_mask then is only 16 bits wide (0xffff as opposed to 0xffffffff), the first half-word of 32-bit thumb instructions will be masked out. This makes the function match 32-bit thumb instructions where the second half-word is equal to instr_val, regardless of the first half-word. The result in this case is that all undefined 32-bit thumb instructions with the second half-word equal to 0xde01 (udf Freescale#1) work as breakpoints and will raise a SIGTRAP instead of a SIGILL, instead of just the one intended 16-bit instruction. An example of such an instruction is 0xeaa0de01, which is unallocated according to Arm ARM and should raise a SIGILL, but instead raises a SIGTRAP. This patch fixes the issue by setting all the bits in instr_mask, which will still match the intended 16-bit thumb instruction (where the upper half is always 0), but not any 32-bit thumb instructions. Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Fredrik Strupe <fredrik@strupe.net> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 18f855e ] Stefano reported a crash with using SQPOLL with io_uring: BUG: kernel NULL pointer dereference, address: 00000000000003b0 CPU: 2 PID: 1307 Comm: io_uring-sq Not tainted 5.7.0-rc7 Freescale#11 RIP: 0010:task_numa_work+0x4f/0x2c0 Call Trace: task_work_run+0x68/0xa0 io_sq_thread+0x252/0x3d0 kthread+0xf9/0x130 ret_from_fork+0x35/0x40 which is task_numa_work() oopsing on current->mm being NULL. The task work is queued by task_tick_numa(), which checks if current->mm is NULL at the time of the call. But this state isn't necessarily persistent, if the kthread is using use_mm() to temporarily adopt the mm of a task. Change the task_tick_numa() check to exclude kernel threads in general, as it doesn't make sense to attempt ot balance for kthreads anyway. Reported-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/865de121-8190-5d30-ece5-3b097dc74431@kernel.dk Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 642aa86 ] The Lenovo Thinkpad T470s I own has a different touchpad with "LEN007a" instead of the already included PNP ID "LEN006c". However, my touchpad seems to work well without any problems using RMI. So this patch adds the other PNP ID. Signed-off-by: Dennis Kadioglu <denk@eclipso.email> Link: https://lore.kernel.org/r/ff770543cd53ae818363c0fe86477965@mail.eclipso.de Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e0bbb53 ] Current implementation could destory a4 & a5 when strace, so we need to get them from pt_regs by SAVE_ALL. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 20be493 ] Fix several issues in the previous gfs2_find_jhead fix: * When updating @blocks_submitted, @block refers to the first block block not submitted yet, not the last block submitted, so fix an off-by-one error. * We want to ensure that @blocks_submitted is far enough ahead of @blocks_read to guarantee that there is in-flight I/O. Otherwise, we'll eventually end up waiting for pages that haven't been submitted, yet. * It's much easier to compare the number of blocks added with the number of blocks submitted to limit the maximum bio size. * Even with bio chaining, we can keep adding blocks until we reach the maximum bio size, as long as we stop at a page boundary. This simplifies the logic. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7846889 ] VNIC protocol version is reported in big-endian format, but it is not byteswapped before logging. Fix that, and remove version comparison as only one protocol version exists at this time. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a101950 ] Commit 1ca3dec ("powerpc/xive: Prevent page fault issues in the machine crash handler") fixed an issue in the FW assisted dump of machines using hash MMU and the XIVE interrupt mode under the POWER hypervisor. It forced the mapping of the ESB page of interrupts being mapped in the Linux IRQ number space to make sure the 'crash kexec' sequence worked during such an event. But it didn't handle the un-mapping. This mapping is now blocking the removal of a passthrough IO adapter under the POWER hypervisor because it expects the guest OS to have cleared all page table entries related to the adapter. If some are still present, the RTAS call which isolates the PCI slot returns error 9001 "valid outstanding translations". Remove these mapping in the IRQ data cleanup routine. Under KVM, this cleanup is not required because the ESB pages for the adapter interrupts are un-mapped from the guest by the hypervisor in the KVM XIVE native device. This is now redundant but it's harmless. Fixes: 1ca3dec ("powerpc/xive: Prevent page fault issues in the machine crash handler") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429075122.1216388-2-clg@kaod.org Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9aea644 ] Commit 6e0a32d ("spi: dw: Fix default polarity of native chipselect") attempted to fix the problem when GPIO active-high chip-select is utilized to communicate with some SPI slave. It fixed the problem, but broke the normal native CS support. At the same time the reversion commit ada9e3f ("spi: dw: Correct handling of native chipselect") didn't solve the problem either, since it just inverted the set_cs() polarity perception without taking into account that CS-high might be applicable. Here is what is done to finally fix the problem. DW SPI controller demands any native CS being set in order to proceed with data transfer. So in order to activate the SPI communications we must set any bit in the Slave Select DW SPI controller register no matter whether the platform requests the GPIO- or native CS. Preferably it should be the bit corresponding to the SPI slave CS number. But currently the dw_spi_set_cs() method activates the chip-select only if the second argument is false. Since the second argument of the set_cs callback is expected to be a boolean with "is-high" semantics (actual chip-select pin state value), the bit in the DW SPI Slave Select register will be set only if SPI core requests the driver to set the CS in the low state. So this will work for active-low GPIO-based CS case, and won't work for active-high CS setting the bit when SPI core actually needs to deactivate the CS. This commit fixes the problem for all described cases. So no matter whether an SPI slave needs GPIO- or native-based CS with active-high or low signal the corresponding bit will be set in SER. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Fixes: ada9e3f ("spi: dw: Correct handling of native chipselect") Fixes: 6e0a32d ("spi: dw: Fix default polarity of native chipselect") Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20200515104758.6934-5-Sergey.Semin@baikalelectronics.ru Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 450edd2 ] Some devices like TP-Link TL-WN722N produces this kind of messages frequently. kernel: ath: phy0: Short RX data len, dropping (dlen: 4) This warning is useful for developers to recognize that the device (Wi-Fi dongle or USB hub etc) is noisy but not for general users. So this patch make this warning to debug message. Reported-By: Denis <pro.denis@protonmail.com> Ref: https://bugzilla.kernel.org/show_bug.cgi?id=207539 Fixes: cd486e6 ("ath9k_htc: Discard undersized packets") Signed-off-by: Masashi Honma <masashi.honma@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200504214443.4485-1-masashi.honma@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 00720f0 ] The mix of IS_ENABLED() and #ifdef checks has left a combination that causes a warning about an unused variable: security/smack/smack_lsm.c: In function 'smack_socket_connect': security/smack/smack_lsm.c:2838:24: error: unused variable 'sip' [-Werror=unused-variable] 2838 | struct sockaddr_in6 *sip = (struct sockaddr_in6 *)sap; Change the code to use C-style checks consistently so the compiler can handle it correctly. Fixes: 87fbfff ("broken ping to ipv6 linklocal addresses on debian buster") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eb356e6 ] If is_closed is set, and the event list is empty, then read() will return -EIO without blocking. After setting is_closed in ib_uverbs_free_event_queue(), we do trigger a wake_up on the poll_wait, but the fops->poll() function does not check it, so poll will continue to sleep on an empty list. Fixes: 14e23bd ("RDMA/core: Fix locking in ib_uverbs_event_read") Link: https://lore.kernel.org/r/0-v1-ace813388969+48859-uverbs_poll_fix%25jgg@mellanox.com Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3c2214b ] Removing the pcrypt module triggers this: general protection fault, probably for non-canonical address 0xdead000000000122 CPU: 5 PID: 264 Comm: modprobe Not tainted 5.6.0+ Freescale#2 Hardware name: QEMU Standard PC RIP: 0010:__cpuhp_state_remove_instance+0xcc/0x120 Call Trace: padata_sysfs_release+0x74/0xce kobject_put+0x81/0xd0 padata_free+0x12/0x20 pcrypt_exit+0x43/0x8ee [pcrypt] padata instances wrongly use the same hlist node for the online and dead states, so __padata_free()'s second cpuhp remove call chokes on the node that the first poisoned. cpuhp multi-instance callbacks only walk forward in cpuhp_step->list and the same node is linked in both the online and dead lists, so the list corruption that results from padata_alloc() adding the node to a second list without removing it from the first doesn't cause problems as long as no instances are freed. Avoid the issue by giving each state its own node. Fixes: 894c9ef ("padata: validate cpumask without removed CPU during offline") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e1750a3 ] After disabling a function, the original handle is logged instead of the disabled handle. Link: https://lkml.kernel.org/r/20200522183922.5253-1-ptesarik@suse.com Fixes: 17cdec9 ("s390/pci: Recover handle in clp_set_pci_fn()") Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e2abfc0 ] Commit 21b5ee5 ("x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF") mistakenly added erratum torvalds#1054 as an OS Visible Workaround (OSVW) ID 0. Erratum torvalds#1054 is not OSVW ID 0 [1], so make it a legacy erratum. There would never have been a false positive on older hardware that has OSVW bit 0 set, since the IRPERF feature was not available. However, save a couple of RDMSR executions per thread, on modern system configurations that correctly set non-zero values in their OSVW_ID_Length MSRs. [1] Revision Guide for AMD Family 17h Models 00h-0Fh Processors. The revision guide is available from the bugzilla link below. Fixes: 21b5ee5 ("x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200417143356.26054-1-kim.phillips@amd.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d43e267 ] KVM stores the gfn in MMIO SPTEs as a caching optimization. These are split in two parts, as in "[high 11111 low]", to thwart any attempt to use these bits in an L1TF attack. This works as long as there are 5 free bits between MAXPHYADDR and bit 50 (inclusive), leaving bit 51 free so that the MMIO access triggers a reserved-bit-set page fault. The bit positions however were computed wrongly for AMD processors that have encryption support. In this case, x86_phys_bits is reduced (for example from 48 to 43, to account for the C bit at position 47 and four bits used internally to store the SEV ASID and other stuff) while x86_cache_bits in would remain set to 48, and _all_ bits between the reduced MAXPHYADDR and bit 51 are set. Then low_phys_bits would also cover some of the bits that are set in the shadow_mmio_value, terribly confusing the gfn caching mechanism. To fix this, avoid splitting gfns as long as the processor does not have the L1TF bug (which includes all AMD processors). When there is no splitting, low_phys_bits can be set to the reduced MAXPHYADDR removing the overlap. This fixes "npt=0" operation on EPYC processors. Thanks to Maxim Levitsky for bisecting this bug. Cc: stable@vger.kernel.org Fixes: 52918ed ("KVM: SVM: Override default MMIO mask if memory encryption is enabled") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f044baa ] The caller of pcie_wait_for_link_delay() specifies the time to wait after the link becomes active. When the downstream port doesn't support link active reporting, obviously we can't tell when the link becomes active, so we waited the worst-case time (1000 ms) plus 100 ms, ignoring the delay from the caller. Instead, wait for 1000 ms + the delay from the caller. Fixes: 4827d63 ("PCI/PM: Add pcie_wait_for_link_delay()") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c6aab66 ] Since the commit 6a13a0d ("ftrace/kprobe: Show the maxactive number on kprobe_events") introduced to show the instance number of kretprobe events, the length of the 1st format of the kprobe event will not 1, but it can be longer. This caused a parser error in perf-probe. Skip the length check the 1st format of the kprobe event to accept this instance number. Without this fix: # perf probe -a vfs_read%return Added new event: probe:vfs_read__return (on vfs_read%return) You can now use it in all perf tools, such as: perf record -e probe:vfs_read__return -aR sleep 1 # perf probe -l Semantic error :Failed to parse event name: r16:probe/vfs_read__return Error: Failed to show event list. And with this fixes: # perf probe -a vfs_read%return ... # perf probe -l probe:vfs_read__return (on vfs_read%return) Fixes: 6a13a0d ("ftrace/kprobe: Show the maxactive number on kprobe_events") Reported-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Yuxuan Shui <yshuiv7@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207587 Link: http://lore.kernel.org/lkml/158877535215.26469.1113127926699134067.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d4eaa28 ] For kvmalloc'ed data object that contains sensitive information like cryptographic keys, we need to make sure that the buffer is always cleared before freeing it. Using memset() alone for buffer clearing may not provide certainty as the compiler may compile it away. To be sure, the special memzero_explicit() has to be used. This patch introduces a new kvfree_sensitive() for freeing those sensitive data objects allocated by kvmalloc(). The relevant places where kvfree_sensitive() can be used are modified to use it. Fixes: 4f08824 ("KEYS: Avoid false positive ENOMEM error on key read") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Joe Perches <joe@perches.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Rientjes <rientjes@google.com> Cc: Uladzislau Rezki <urezki@gmail.com> Link: http://lkml.kernel.org/r/20200407200318.11711-1-longman@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0531b03 ] Flower tests used to create ingress filter with specified parent qdisc "parent ffff:" but dump them on "ingress". With recent commit that fixed tcm_parent handling in dump those are not considered same parent anymore, which causes iproute2 tc to emit additional "parent ffff:" in first line of filter dump output. The change in output causes filter match in tests to fail. Prevent parent qdisc output when dumping filters in flower tests by always correctly specifying "ingress" parent both when creating and dumping filters. Fixes: a7df487 ("net_sched: fix tcm_parent in tc filter dump") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2f02fd3 ] The comments in fanotify_group_event_mask() say: "If the event is on dir/child and this mark doesn't care about events on dir/child, don't send it!" Specifically, mount and filesystem marks do not care about events on child, but they can still specify an ignore mask for those events. For example, a group that has: - A mount mark with mask 0 and ignore_mask FAN_OPEN - An inode mark on a directory with mask FAN_OPEN | FAN_OPEN_EXEC with flag FAN_EVENT_ON_CHILD A child file open for exec would be reported to group with the FAN_OPEN event despite the fact that FAN_OPEN is in ignore mask of mount mark, because the mark iteration loop skips over non-inode marks for events on child when calculating the ignore mask. Move ignore mask calculation to the top of the iteration loop block before excluding marks for events on dir/child. Link: https://lore.kernel.org/r/20200524072441.18258-1-amir73il@gmail.com Reported-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/linux-fsdevel/20200521162443.GA26052@quack2.suse.cz/ Fixes: 55bf882 "fanotify: fix merging marks masks with FAN_ONDIR" Fixes: b469e7e "fanotify: fix handling of events on child..." Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 530f32f upstream. Avi Kivity reports that on fuse filesystems running in a user namespace asyncronous fsync fails with EOVERFLOW. The reason is that f_ops->fsync() is called with the creds of the kthread performing aio work instead of the creds of the process originally submitting IOCB_CMD_FSYNC. Fuse sends the creds of the caller in the request header and it needs to translate the uid and gid into the server's user namespace. Since the kthread is running in init_user_ns, the translation will fail and the operation returns an error. It can be argued that fsync doesn't actually need any creds, but just zeroing out those fields in the header (as with requests that currently don't take creds) is a backward compatibility risk. Instead of working around this issue in fuse, solve the core of the problem by calling the filesystem with the proper creds. Reported-by: Avi Kivity <avi@scylladb.com> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Fixes: c9582eb ("fuse: Fail all requests with invalid uids or gids") Cc: stable@vger.kernel.org # 4.18+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6fd8525 upstream. When TM command times out, driver invokes the controller reset. Post reset, driver re-fires pended TM commands which leads to firmware crash. Post controller reset, return pended TM commands back to OS. Link: https://lore.kernel.org/r/20200508085242.23406-1-chandrakanth.patil@broadcom.com Cc: stable@vger.kernel.org Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f809da6 upstream. Implementation of a previous patch added a condition to an if check that always end up with the if test being true. Execution of the else clause was inadvertently negated. The additional condition check was incorrect and unnecessary after the other modifications had been done in that patch. Remove the check from the if series. Link: https://lore.kernel.org/r/20200501214310.91713-5-jsmart2021@gmail.com Fixes: b95b211 ("scsi: lpfc: Fix loss of remote port after devloss due to lack of RPIs") Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 619ee76 upstream. Check whether error_log file exists in tracing/error_log testcase and return UNSUPPORTED if no error_log file. This can happen if we run the ftracetest on the older stable kernel. Fixes: 4eab1cc ("selftests/ftrace: Add tracing/error_log testcase") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ced21a4 upstream. The skb is consumed by htc_send_epid, so it needn't release again. The case reported by syzbot: https://lore.kernel.org/linux-usb/000000000000590f6b05a1c05d15@google.com usb 1-1: ath9k_htc: Firmware ath9k_htc/htc_9271-1.4.0.fw requested usb 1-1: ath9k_htc: Transferred FW: ath9k_htc/htc_9271-1.4.0.fw, size: 51008 usb 1-1: Service connection timeout for: 256 ================================================================== BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:26 [inline] BUG: KASAN: use-after-free in refcount_read include/linux/refcount.h:134 [inline] BUG: KASAN: use-after-free in skb_unref include/linux/skbuff.h:1042 [inline] BUG: KASAN: use-after-free in kfree_skb+0x32/0x3d0 net/core/skbuff.c:692 Read of size 4 at addr ffff8881d0957994 by task kworker/1:2/83 Call Trace: kfree_skb+0x32/0x3d0 net/core/skbuff.c:692 htc_connect_service.cold+0xa9/0x109 drivers/net/wireless/ath/ath9k/htc_hst.c:282 ath9k_wmi_connect+0xd2/0x1a0 drivers/net/wireless/ath/ath9k/wmi.c:265 ath9k_init_htc_services.constprop.0+0xb4/0x650 drivers/net/wireless/ath/ath9k/htc_drv_init.c:146 ath9k_htc_probe_device+0x25a/0x1d80 drivers/net/wireless/ath/ath9k/htc_drv_init.c:959 ath9k_htc_hw_init+0x31/0x60 drivers/net/wireless/ath/ath9k/htc_hst.c:501 ath9k_hif_usb_firmware_cb+0x26b/0x500 drivers/net/wireless/ath/ath9k/hif_usb.c:1187 request_firmware_work_func+0x126/0x242 drivers/base/firmware_loader/main.c:976 process_one_work+0x94b/0x1620 kernel/workqueue.c:2264 worker_thread+0x96/0xe20 kernel/workqueue.c:2410 kthread+0x318/0x420 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Allocated by task 83: kmem_cache_alloc_node+0xdc/0x330 mm/slub.c:2814 __alloc_skb+0xba/0x5a0 net/core/skbuff.c:198 alloc_skb include/linux/skbuff.h:1081 [inline] htc_connect_service+0x2cc/0x840 drivers/net/wireless/ath/ath9k/htc_hst.c:257 ath9k_wmi_connect+0xd2/0x1a0 drivers/net/wireless/ath/ath9k/wmi.c:265 ath9k_init_htc_services.constprop.0+0xb4/0x650 drivers/net/wireless/ath/ath9k/htc_drv_init.c:146 ath9k_htc_probe_device+0x25a/0x1d80 drivers/net/wireless/ath/ath9k/htc_drv_init.c:959 ath9k_htc_hw_init+0x31/0x60 drivers/net/wireless/ath/ath9k/htc_hst.c:501 ath9k_hif_usb_firmware_cb+0x26b/0x500 drivers/net/wireless/ath/ath9k/hif_usb.c:1187 request_firmware_work_func+0x126/0x242 drivers/base/firmware_loader/main.c:976 process_one_work+0x94b/0x1620 kernel/workqueue.c:2264 worker_thread+0x96/0xe20 kernel/workqueue.c:2410 kthread+0x318/0x420 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Freed by task 0: kfree_skb+0x102/0x3d0 net/core/skbuff.c:690 ath9k_htc_txcompletion_cb+0x1f8/0x2b0 drivers/net/wireless/ath/ath9k/htc_hst.c:356 hif_usb_regout_cb+0x10b/0x1b0 drivers/net/wireless/ath/ath9k/hif_usb.c:90 __usb_hcd_giveback_urb+0x29a/0x550 drivers/usb/core/hcd.c:1650 usb_hcd_giveback_urb+0x368/0x420 drivers/usb/core/hcd.c:1716 dummy_timer+0x1258/0x32ae drivers/usb/gadget/udc/dummy_hcd.c:1966 call_timer_fn+0x195/0x6f0 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0x5f9/0x1500 kernel/time/timer.c:1786 __do_softirq+0x21e/0x950 kernel/softirq.c:292 Reported-and-tested-by: syzbot+9505af1ae303dabdc646@syzkaller.appspotmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200404041838.10426-2-hqjagain@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit abeaa85 upstream. Free wmi later after cmd urb has been killed, as urb cb will access wmi. the case reported by syzbot: https://lore.kernel.org/linux-usb/0000000000000002fc05a1d61a68@google.com BUG: KASAN: use-after-free in ath9k_wmi_ctrl_rx+0x416/0x500 drivers/net/wireless/ath/ath9k/wmi.c:215 Read of size 1 at addr ffff8881cef1417c by task swapper/1/0 Call Trace: <IRQ> ath9k_wmi_ctrl_rx+0x416/0x500 drivers/net/wireless/ath/ath9k/wmi.c:215 ath9k_htc_rx_msg+0x2da/0xaf0 drivers/net/wireless/ath/ath9k/htc_hst.c:459 ath9k_hif_usb_reg_in_cb+0x1ba/0x630 drivers/net/wireless/ath/ath9k/hif_usb.c:718 __usb_hcd_giveback_urb+0x29a/0x550 drivers/usb/core/hcd.c:1650 usb_hcd_giveback_urb+0x368/0x420 drivers/usb/core/hcd.c:1716 dummy_timer+0x1258/0x32ae drivers/usb/gadget/udc/dummy_hcd.c:1966 call_timer_fn+0x195/0x6f0 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0x5f9/0x1500 kernel/time/timer.c:1786 Reported-and-tested-by: syzbot+5d338854440137ea0fef@syzkaller.appspotmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200404041838.10426-3-hqjagain@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4ff08a upstream. Write out of slab bounds. We should check epid. The case reported by syzbot: https://lore.kernel.org/linux-usb/0000000000006ac55b05a1c05d72@google.com BUG: KASAN: use-after-free in htc_process_conn_rsp drivers/net/wireless/ath/ath9k/htc_hst.c:131 [inline] BUG: KASAN: use-after-free in ath9k_htc_rx_msg+0xa25/0xaf0 drivers/net/wireless/ath/ath9k/htc_hst.c:443 Write of size 2 at addr ffff8881cea291f0 by task swapper/1/0 Call Trace: htc_process_conn_rsp drivers/net/wireless/ath/ath9k/htc_hst.c:131 [inline] ath9k_htc_rx_msg+0xa25/0xaf0 drivers/net/wireless/ath/ath9k/htc_hst.c:443 ath9k_hif_usb_reg_in_cb+0x1ba/0x630 drivers/net/wireless/ath/ath9k/hif_usb.c:718 __usb_hcd_giveback_urb+0x29a/0x550 drivers/usb/core/hcd.c:1650 usb_hcd_giveback_urb+0x368/0x420 drivers/usb/core/hcd.c:1716 dummy_timer+0x1258/0x32ae drivers/usb/gadget/udc/dummy_hcd.c:1966 call_timer_fn+0x195/0x6f0 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0x5f9/0x1500 kernel/time/timer.c:1786 Reported-and-tested-by: syzbot+b1c61e5f11be5782f192@syzkaller.appspotmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200404041838.10426-4-hqjagain@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 19d6c37 upstream. Add barrier to accessing the stack array skb_pool. The case reported by syzbot: https://lore.kernel.org/linux-usb/0000000000003d7c1505a2168418@google.com BUG: KASAN: stack-out-of-bounds in ath9k_hif_usb_rx_stream drivers/net/wireless/ath/ath9k/hif_usb.c:626 [inline] BUG: KASAN: stack-out-of-bounds in ath9k_hif_usb_rx_cb+0xdf6/0xf70 drivers/net/wireless/ath/ath9k/hif_usb.c:666 Write of size 8 at addr ffff8881db309a28 by task swapper/1/0 Call Trace: ath9k_hif_usb_rx_stream drivers/net/wireless/ath/ath9k/hif_usb.c:626 [inline] ath9k_hif_usb_rx_cb+0xdf6/0xf70 drivers/net/wireless/ath/ath9k/hif_usb.c:666 __usb_hcd_giveback_urb+0x1f2/0x470 drivers/usb/core/hcd.c:1648 usb_hcd_giveback_urb+0x368/0x420 drivers/usb/core/hcd.c:1713 dummy_timer+0x1258/0x32ae drivers/usb/gadget/udc/dummy_hcd.c:1966 call_timer_fn+0x195/0x6f0 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0x5f9/0x1500 kernel/time/timer.c:1786 Reported-and-tested-by: syzbot+d403396d4df67ad0bd5f@syzkaller.appspotmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200404041838.10426-5-hqjagain@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bbcaae upstream. In ath9k_hif_usb_rx_cb interface number is assumed to be 0. usb_ifnum_to_if(urb->dev, 0) But it isn't always true. The case reported by syzbot: https://lore.kernel.org/linux-usb/000000000000666c9c05a1c05d12@google.com usb 2-1: new high-speed USB device number 2 using dummy_hcd usb 2-1: config 1 has an invalid interface number: 2 but max is 0 usb 2-1: config 1 has no interface number 0 usb 2-1: New USB device found, idVendor=0cf3, idProduct=9271, bcdDevice= 1.08 usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 general protection fault, probably for non-canonical address 0xdffffc0000000015: 0000 [Freescale#1] SMP KASAN KASAN: null-ptr-deref in range [0x00000000000000a8-0x00000000000000af] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc5-syzkaller #0 Call Trace __usb_hcd_giveback_urb+0x29a/0x550 drivers/usb/core/hcd.c:1650 usb_hcd_giveback_urb+0x368/0x420 drivers/usb/core/hcd.c:1716 dummy_timer+0x1258/0x32ae drivers/usb/gadget/udc/dummy_hcd.c:1966 call_timer_fn+0x195/0x6f0 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0x5f9/0x1500 kernel/time/timer.c:1786 __do_softirq+0x21e/0x950 kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0x178/0x1a0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:546 [inline] smp_apic_timer_interrupt+0x141/0x540 arch/x86/kernel/apic/apic.c:1146 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 Reported-and-tested-by: syzbot+40d5d2e8a4680952f042@syzkaller.appspotmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200404041838.10426-6-hqjagain@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 84e99e5 upstream. Add barrier to soob. Return -EOVERFLOW if the buffer is exceeded. Suggested-by: Hillf Danton <hdanton@sina.com> Reported-by: syzbot+bfdd4a2f07be52351350@syzkaller.appspotmail.com Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ea2ea4 upstream. We need to keep the reference to the drm_gem_object until the last access by vkms_dumb_create. Therefore, the put the object after it is used. This fixes a use-after-free issue reported by syzbot. While here, change vkms_gem_create() symbol to static. Reported-and-tested-by: syzbot+e3372a2afe1e7ef04bc7@syzkaller.appspotmail.com Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200427214405.13069-1-ezequiel@collabora.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dde3c6b upstream. syzkaller reports for memory leak when kobject_init_and_add() returns an error in the function sysfs_slab_add() [1] When this happened, the function kobject_put() is not called for the corresponding kobject, which potentially leads to memory leak. This patch fixes the issue by calling kobject_put() even if kobject_init_and_add() fails. [1] BUG: memory leak unreferenced object 0xffff8880a6d4be88 (size 8): comm "syz-executor.3", pid 946, jiffies 4295772514 (age 18.396s) hex dump (first 8 bytes): 70 69 64 5f 33 00 ff ff pid_3... backtrace: kstrdup+0x35/0x70 mm/util.c:60 kstrdup_const+0x3d/0x50 mm/util.c:82 kvasprintf_const+0x112/0x170 lib/kasprintf.c:48 kobject_set_name_vargs+0x55/0x130 lib/kobject.c:289 kobject_add_varg lib/kobject.c:384 [inline] kobject_init_and_add+0xd8/0x170 lib/kobject.c:473 sysfs_slab_add+0x1d8/0x290 mm/slub.c:5811 __kmem_cache_create+0x50a/0x570 mm/slub.c:4384 create_cache+0x113/0x1e0 mm/slab_common.c:407 kmem_cache_create_usercopy+0x1a1/0x260 mm/slab_common.c:505 kmem_cache_create+0xd/0x10 mm/slab_common.c:564 create_pid_cachep kernel/pid_namespace.c:54 [inline] create_pid_namespace kernel/pid_namespace.c:96 [inline] copy_pid_ns+0x77c/0x8f0 kernel/pid_namespace.c:148 create_new_namespaces+0x26b/0xa30 kernel/nsproxy.c:95 unshare_nsproxy_namespaces+0xa7/0x1e0 kernel/nsproxy.c:229 ksys_unshare+0x3d2/0x770 kernel/fork.c:2969 __do_sys_unshare kernel/fork.c:3037 [inline] __se_sys_unshare kernel/fork.c:3035 [inline] __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3035 do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295 Fixes: 80da026 ("mm/slub: fix slab double-free in case of duplicate sysfs filename") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200602115033.1054-1-wanghai38@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1b6575 upstream. If FAT length == 0, the image doesn't have any data. And it can be the cause of overlapping the root dir and FAT entries. Also Windows treats it as invalid format. Reported-by: syzbot+6f1624f937d9d6911e2d@syzkaller.appspotmail.com Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/87r1wz8mrd.fsf@mail.parknet.co.jp Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ed6edd upstream. Under rare circumstances, task_function_call() can repeatedly fail and cause a soft lockup. There is a slight race where the process is no longer running on the cpu we targeted by the time remote_function() runs. The code will simply try again. If we are very unlucky, this will continue to fail, until a watchdog fires. This can happen in a heavily loaded, multi-core virtual machine. Reported-by: syzbot+bb4935a5c09b5ff79940@syzkaller.appspotmail.com Signed-off-by: Barret Rhoden <brho@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200414222920.121401-1-brho@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f30d3ce upstream. After changing the timing between GTT updates and execution on the GPU, we started seeing sporadic failures on Ironlake. These were narrowed down to being an insufficiently strong enough barrier/delay after updating the GTT and scheduling execution on the GPU. By forcing the uncached read, and adding the missing barrier for the singular insert_page (relocation paths), the sporadic failures go away. Fixes: 983d308 ("agp/intel: Serialise after GTT updates") Fixes: 3497971 ("agp/intel: Flush chipset writes after updating a single PTE") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Andi Shyti <andi.shyti@intel.com> Cc: stable@vger.kernel.org # v4.0+ Link: https://patchwork.freedesktop.org/patch/msgid/20200410083535.25464-1-chris@chris-wilson.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9253d71 upstream. Clear tuning_done flag while executing tuning to ensure vendor specific HS400 settings are applied properly when the controller is re-initialized in HS400 mode. Without this, re-initialization of the qcom SDHC in HS400 mode fails while resuming the driver from runtime-suspend or system-suspend. Fixes: ff06ce4 ("mmc: sdhci-msm: Add HS400 platform support") Cc: stable@vger.kernel.org Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org> Link: https://lore.kernel.org/r/1590678838-18099-1-git-send-email-vbadigan@codeaurora.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe8d33b upstream. Turning on CONFIG_DMA_API_DEBUG_SG results in the following warning: WARNING: CPU: 1 PID: 20 at kernel/dma/debug.c:500 add_dma_entry+0x16c/0x17c DMA-API: exceeded 7 overlapping mappings of cacheline 0x031d2645 Modules linked in: CPU: 1 PID: 20 Comm: kworker/1:1 Not tainted 5.5.0-rc2-00021-gdeda30999c2b-dirty Freescale#49 Hardware name: STM32 (Device Tree Support) Workqueue: events_freezable mmc_rescan [<c03138c0>] (unwind_backtrace) from [<c030d760>] (show_stack+0x10/0x14) [<c030d760>] (show_stack) from [<c0f2eb28>] (dump_stack+0xc0/0xd4) [<c0f2eb28>] (dump_stack) from [<c034a14c>] (__warn+0xd0/0xf8) [<c034a14c>] (__warn) from [<c034a530>] (warn_slowpath_fmt+0x94/0xb8) [<c034a530>] (warn_slowpath_fmt) from [<c03bca0c>] (add_dma_entry+0x16c/0x17c) [<c03bca0c>] (add_dma_entry) from [<c03bdf54>] (debug_dma_map_sg+0xe4/0x3d4) [<c03bdf54>] (debug_dma_map_sg) from [<c0d09244>] (sdmmc_idma_prep_data+0x94/0xf8) [<c0d09244>] (sdmmc_idma_prep_data) from [<c0d05a2c>] (mmci_prep_data+0x2c/0xb0) [<c0d05a2c>] (mmci_prep_data) from [<c0d073ec>] (mmci_start_data+0x134/0x2f0) [<c0d073ec>] (mmci_start_data) from [<c0d078d0>] (mmci_request+0xe8/0x154) [<c0d078d0>] (mmci_request) from [<c0cecb44>] (mmc_start_request+0x94/0xbc) DMA api debug brings to light leaking dma-mappings, dma_map_sg and dma_unmap_sg are not correctly balanced. If a request is prepared, the dma_map/unmap are done in asynchronous call pre_req (prep_data) and post_req (unprep_data). In this case the dma-mapping is right balanced. But if the request was not prepared, the data->host_cookie is define to zero and the dma_map/unmap must be done in the request. The dma_map is called by mmci_dma_start (prep_data), but there is no dma_unmap in this case. This patch adds dma_unmap_sg when the dma is finalized and the data cookie is zero (request not prepared). Signed-off-by: Ludovic Barre <ludovic.barre@st.com> Link: https://lore.kernel.org/r/20200526155103.12514-2-ludovic.barre@st.com Fixes: 46b723d ("mmc: mmci: add stm32 sdmmc variant") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bd7844 upstream. Before calling tmio_mmc_host_probe(), the caller is required to enable clocks for its device, as to make it accessible when reading/writing registers during probe. Therefore, the responsibility to disable these clocks, in the error path of ->probe() and during ->remove(), is better managed outside tmio_mmc_host_remove(). As a matter of fact, callers of tmio_mmc_host_remove() already expects this to be the behaviour. However, there's a problem with tmio_mmc_host_remove() when the Kconfig option, CONFIG_PM, is set. More precisely, tmio_mmc_host_remove() may then disable the clock via runtime PM, which leads to clock enable/disable imbalance problems, when the caller of tmio_mmc_host_remove() also tries to disable the same clocks. To solve the problem, let's make sure tmio_mmc_host_remove() leaves the device with clocks enabled, but also make sure to disable the IRQs, as we normally do at ->runtime_suspend(). Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200519152434.6867-1-ulf.hansson@linaro.org Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d1f42e upstream. Currently, tmio_mmc_irq() handler is registered before the host is fully initialized by tmio_mmc_host_probe(). I did not previously notice this problem. The boot ROM of a new Socionext SoC unmasks interrupts (CTL_IRQ_MASK) somehow. The handler is invoked before tmio_mmc_host_probe(), then emits noisy call trace. Move devm_request_irq() below tmio_mmc_host_probe(). Fixes: 3fd784f ("mmc: uniphier-sd: add UniPhier SD/eMMC controller driver") Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200511062158.1790924-1-yamada.masahiro@socionext.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a1af7f3 upstream. Remove non-removable and mmc-ddr-1_8v properties from the sdmmc0 node which come probably from an unchecked copy/paste. Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com> Fixes:42ed535595ec "ARM: dts: at91: introduce the sama5d2 ptc ek board" Cc: stable@vger.kernel.org # 4.19 and later Link: https://lore.kernel.org/r/20200401221504.41196-1-ludovic.desroches@microchip.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f04086c upstream. During some scenarios mmc_sdio_init_card() runs a retry path for the UHS-I specific initialization, which leads to removal of the previously allocated card. A new card is then re-allocated while retrying. However, in one of the corresponding error paths we may end up to remove an already removed card, which likely leads to a NULL pointer exception. So, let's fix this. Fixes: 5fc3d80 ("mmc: sdio: don't use rocr to check if the card could support UHS mode") Cc: <stable@vger.kernel.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20200430091640.455-2-ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a94a59f upstream. Over the years, the code in mmc_sdio_init_card() has grown to become quite messy. Unfortunate this has also lead to that several paths are leaking memory in form of an allocated struct mmc_card, which includes additional data, such as initialized struct device for example. Unfortunate, it's a too complex task find each offending commit. Therefore, this change fixes all memory leaks at once. Cc: <stable@vger.kernel.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20200430091640.455-3-ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 263c615 upstream. Since the switch of floppy driver to blk-mq, the contended (fdc_busy) case in floppy_queue_rq() is not handled correctly. In case we reach floppy_queue_rq() with fdc_busy set (i.e. with the floppy locked due to another request still being in-flight), we put the request on the list of requests and return BLK_STS_OK to the block core, without actually scheduling delayed work / doing further processing of the request. This means that processing of this request is postponed until another request comes and passess uncontended. Which in some cases might actually never happen and we keep waiting indefinitely. The simple testcase is for i in `seq 1 2000`; do echo -en $i '\r'; blkid --info /dev/fd0 2> /dev/null; done run in quemu. That reliably causes blkid eventually indefinitely hanging in __floppy_read_block_0() waiting for completion, as the BIO callback never happens, and no further IO is ever submitted on the (non-existent) floppy device. This was observed reliably on qemu-emulated device. Fix that by not queuing the request in the contended case, and return BLK_STS_RESOURCE instead, so that blk core handles the request rescheduling and let it pass properly non-contended later. Fixes: a9f38e1 ("floppy: convert to blk-mq") Cc: stable@vger.kernel.org Tested-by: Libor Pechacek <lpechacek@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c8d70a2 upstream. backend_connect() can fail, so switch the device to connected only if no error occurred. Fixes: 0a9c75c ("xen/pvcalls: xenbus state handling") Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20200511074231.19794-1-jgross@suse.com Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0370964 upstream. On a VHE system, the EL1 state is left in the CPU most of the time, and only syncronized back to memory when vcpu_put() is called (most of the time on preemption). Which means that when injecting an exception, we'd better have a way to either: (1) write directly to the EL1 sysregs (2) synchronize the state back to memory, and do the changes there For an AArch64, we already do (1), so we are safe. Unfortunately, doing the same thing for AArch32 would be pretty invasive. Instead, we can easily implement (2) by calling the put/load architectural backends, and keep preemption disabled. We can then reload the state back into EL1. Cc: stable@vger.kernel.org Reported-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef3e40a upstream. When using the PtrAuth feature in a guest, we need to save the host's keys before allowing the guest to program them. For that, we dump them in a per-CPU data structure (the so called host context). But both call sites that do this are in preemptible context, which may end up in disaster should the vcpu thread get preempted before reentering the guest. Instead, save the keys eagerly on each vcpu_load(). This has an increased overhead, but is at least safe. Cc: stable@vger.kernel.org Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is the 5.4.47 stable release Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
otavio
pushed a commit
that referenced
this pull request
Dec 9, 2024
commit a9685b4 upstream. If open_cached_dir() encounters an error parsing the lease from the server, the error handling may race with receiving a lease break, resulting in open_cached_dir() freeing the cfid while the queued work is pending. Update open_cached_dir() to drop refs rather than directly freeing the cfid. Have cached_dir_lease_break(), cfids_laundromat_worker(), and invalidate_all_cached_dirs() clear has_lease immediately while still holding cfids->cfid_list_lock, and then use this to also simplify the reference counting in cfids_laundromat_worker() and invalidate_all_cached_dirs(). Fixes this KASAN splat (which manually injects an error and lease break in open_cached_dir()): ================================================================== BUG: KASAN: slab-use-after-free in smb2_cached_lease_break+0x27/0xb0 Read of size 8 at addr ffff88811cc24c10 by task kworker/3:1/65 CPU: 3 UID: 0 PID: 65 Comm: kworker/3:1 Not tainted 6.12.0-rc6-g255cf264e6e5-dirty #87 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 Workqueue: cifsiod smb2_cached_lease_break Call Trace: <TASK> dump_stack_lvl+0x77/0xb0 print_report+0xce/0x660 kasan_report+0xd3/0x110 smb2_cached_lease_break+0x27/0xb0 process_one_work+0x50a/0xc50 worker_thread+0x2ba/0x530 kthread+0x17c/0x1c0 ret_from_fork+0x34/0x60 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 2464: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0xaa/0xb0 open_cached_dir+0xa7d/0x1fb0 smb2_query_path_info+0x43c/0x6e0 cifs_get_fattr+0x346/0xf10 cifs_get_inode_info+0x157/0x210 cifs_revalidate_dentry_attr+0x2d1/0x460 cifs_getattr+0x173/0x470 vfs_statx_path+0x10f/0x160 vfs_statx+0xe9/0x150 vfs_fstatat+0x5e/0xc0 __do_sys_newfstatat+0x91/0xf0 do_syscall_64+0x95/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 2464: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x51/0x70 kfree+0x174/0x520 open_cached_dir+0x97f/0x1fb0 smb2_query_path_info+0x43c/0x6e0 cifs_get_fattr+0x346/0xf10 cifs_get_inode_info+0x157/0x210 cifs_revalidate_dentry_attr+0x2d1/0x460 cifs_getattr+0x173/0x470 vfs_statx_path+0x10f/0x160 vfs_statx+0xe9/0x150 vfs_fstatat+0x5e/0xc0 __do_sys_newfstatat+0x91/0xf0 do_syscall_64+0x95/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Last potentially related work creation: kasan_save_stack+0x33/0x60 __kasan_record_aux_stack+0xad/0xc0 insert_work+0x32/0x100 __queue_work+0x5c9/0x870 queue_work_on+0x82/0x90 open_cached_dir+0x1369/0x1fb0 smb2_query_path_info+0x43c/0x6e0 cifs_get_fattr+0x346/0xf10 cifs_get_inode_info+0x157/0x210 cifs_revalidate_dentry_attr+0x2d1/0x460 cifs_getattr+0x173/0x470 vfs_statx_path+0x10f/0x160 vfs_statx+0xe9/0x150 vfs_fstatat+0x5e/0xc0 __do_sys_newfstatat+0x91/0xf0 do_syscall_64+0x95/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The buggy address belongs to the object at ffff88811cc24c00 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 16 bytes inside of freed 1024-byte region [ffff88811cc24c00, ffff88811cc25000) Cc: stable@vger.kernel.org Signed-off-by: Paul Aurich <paul@darkrain42.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
zandrey
pushed a commit
to zandrey/linux-fslc
that referenced
this pull request
Feb 8, 2025
commit c2b47df upstream. [BUG] With CONFIG_DEBUG_VM set, test case generic/476 has some chance to crash with the following VM_BUG_ON_FOLIO(): BTRFS error (device dm-3): cow_file_range failed, start 1146880 end 1253375 len 106496 ret -28 BTRFS error (device dm-3): run_delalloc_nocow failed, start 1146880 end 1253375 len 106496 ret -28 page: refcount:4 mapcount:0 mapping:00000000592787cc index:0x12 pfn:0x10664 aops:btrfs_aops [btrfs] ino:101 dentry name(?):"f1774" flags: 0x2fffff80004028(uptodate|lru|private|node=0|zone=2|lastcpupid=0xfffff) page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio)) ------------[ cut here ]------------ kernel BUG at mm/page-writeback.c:2992! Internal error: Oops - BUG: 00000000f2000800 [Freescale#1] SMP CPU: 2 UID: 0 PID: 3943513 Comm: kworker/u24:15 Tainted: G OE 6.12.0-rc7-custom+ Freescale#87 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] pc : folio_clear_dirty_for_io+0x128/0x258 lr : folio_clear_dirty_for_io+0x128/0x258 Call trace: folio_clear_dirty_for_io+0x128/0x258 btrfs_folio_clamp_clear_dirty+0x80/0xd0 [btrfs] __process_folios_contig+0x154/0x268 [btrfs] extent_clear_unlock_delalloc+0x5c/0x80 [btrfs] run_delalloc_nocow+0x5f8/0x760 [btrfs] btrfs_run_delalloc_range+0xa8/0x220 [btrfs] writepage_delalloc+0x230/0x4c8 [btrfs] extent_writepage+0xb8/0x358 [btrfs] extent_write_cache_pages+0x21c/0x4e8 [btrfs] btrfs_writepages+0x94/0x150 [btrfs] do_writepages+0x74/0x190 filemap_fdatawrite_wbc+0x88/0xc8 start_delalloc_inodes+0x178/0x3a8 [btrfs] btrfs_start_delalloc_roots+0x174/0x280 [btrfs] shrink_delalloc+0x114/0x280 [btrfs] flush_space+0x250/0x2f8 [btrfs] btrfs_async_reclaim_data_space+0x180/0x228 [btrfs] process_one_work+0x164/0x408 worker_thread+0x25c/0x388 kthread+0x100/0x118 ret_from_fork+0x10/0x20 Code: 910a8021 a90363f7 a9046bf9 94012379 (d4210000) ---[ end trace 0000000000000000 ]--- [CAUSE] The first two lines of extra debug messages show the problem is caused by the error handling of run_delalloc_nocow(). E.g. we have the following dirtied range (4K blocksize 4K page size): 0 16K 32K |//////////////////////////////////////| | Pre-allocated | And the range [0, 16K) has a preallocated extent. - Enter run_delalloc_nocow() for range [0, 16K) Which found range [0, 16K) is preallocated, can do the proper NOCOW write. - Enter fallback_to_fow() for range [16K, 32K) Since the range [16K, 32K) is not backed by preallocated extent, we have to go COW. - cow_file_range() failed for range [16K, 32K) So cow_file_range() will do the clean up by clearing folio dirty, unlock the folios. Now the folios in range [16K, 32K) is unlocked. - Enter extent_clear_unlock_delalloc() from run_delalloc_nocow() Which is called with PAGE_START_WRITEBACK to start page writeback. But folios can only be marked writeback when it's properly locked, thus this triggered the VM_BUG_ON_FOLIO(). Furthermore there is another hidden but common bug that run_delalloc_nocow() is not clearing the folio dirty flags in its error handling path. This is the common bug shared between run_delalloc_nocow() and cow_file_range(). [FIX] - Clear folio dirty for range [@start, @cur_offset) Introduce a helper, cleanup_dirty_folios(), which will find and lock the folio in the range, clear the dirty flag and start/end the writeback, with the extra handling for the @locked_folio. - Introduce a helper to clear folio dirty, start and end writeback - Introduce a helper to record the last failed COW range end This is to trace which range we should skip, to avoid double unlocking. - Skip the failed COW range for the error handling CC: stable@vger.kernel.org Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Automatic merge performed, no conflicts reported.
-- andrey