forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update 5.4-2.1.x-imx to v5.4.80 from stable #182
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ Upstream commit a5ebcbd ] Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 1da177e ("Linux-2.6.12-rc2") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1605581105-35295-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6dceaa9 ] The `skb' is mapped for DMA in ns_send() but does not unmap DMA in case push_scqe() fails to submit the `skb'. The memory of the `skb' is released so only the DMA mapping is leaking. Unmap the DMA mapping in case push_scqe() failed. Fixes: 864a3ff ("atm: [nicstar] remove virt_to_bus() and support 64-bit platforms") Cc: Chas Williams <3chas3@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4260330 ] The module eeprom address range returned by bnxt_get_module_eeprom() should be 256 bytes of A0h address space, the lower half of the A2h address space, and page 0 for the upper half of the A2h address space. Fix the firmware call by passing page_number 0 for the A2h slave address space. Fixes: 42ee18f ("bnxt_en: Add Support for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPRO") Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 849920c ] If sb_occ_port_pool_get() failed in devlink_nl_sb_port_pool_fill(), msg should be canceled by genlmsg_cancel(). Fixes: df38daf ("devlink: implement shared buffer occupancy monitoring interface") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20201113111622.11040-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8cf8821 ] Commit 5895631 ("neighbor: Improve garbage collection") guarantees neighbour table entries a five-second lifetime. Processes which make heavy use of multicast can fill the neighour table with multicast addresses in five seconds. At that point, neighbour entries can't be GC-ed because they aren't five seconds old yet, the kernel log starts to fill up with "neighbor table overflow!" messages, and sends start to fail. This patch allows multicast addresses to be thrown out before they've lived out their five seconds. This makes room for non-multicast addresses and makes messages to all addresses more reliable in these circumstances. Fixes: 5895631 ("neighbor: Improve garbage collection") Signed-off-by: Jeff Dike <jdike@akamai.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20201113015815.31397-1-jdike@akamai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e33de7c ] nlmsg_cancel() needs to be called in the error path of inet_req_diag_fill to cancel the message. Fixes: d545cac ("net: inet: diag: expose the socket mark to privileged processes.") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20201116082018.16496-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ceb736e ] genlmsg_cancel() needs to be called in the error path of inet6_fill_ifmcaddr and inet6_fill_ifacaddr to cancel the message. Fixes: 6ecf4c3 ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20201112080950.1476302-1-zhangqilong3@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e35df62 ] When running this chip on arm imx6, we intermittently observe the following kernel warning in the log, especially when the system is under high load: [ 50.119484] ------------[ cut here ]------------ [ 50.124377] WARNING: CPU: 0 PID: 303 at kernel/softirq.c:169 __local_bh_enable_ip+0x100/0x184 [ 50.132925] IRQs not enabled as expected [ 50.159250] CPU: 0 PID: 303 Comm: rngd Not tainted 5.7.8 Freescale#1 [ 50.164837] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 50.171395] [<c0111a38>] (unwind_backtrace) from [<c010be28>] (show_stack+0x10/0x14) [ 50.179162] [<c010be28>] (show_stack) from [<c05b9dec>] (dump_stack+0xac/0xd8) [ 50.186408] [<c05b9dec>] (dump_stack) from [<c0122e40>] (__warn+0xd0/0x10c) [ 50.193391] [<c0122e40>] (__warn) from [<c0123238>] (warn_slowpath_fmt+0x98/0xc4) [ 50.200892] [<c0123238>] (warn_slowpath_fmt) from [<c012b010>] (__local_bh_enable_ip+0x100/0x184) [ 50.209860] [<c012b010>] (__local_bh_enable_ip) from [<bf09ecbc>] (destroy_conntrack+0x48/0xd8 [nf_conntrack]) [ 50.220038] [<bf09ecbc>] (destroy_conntrack [nf_conntrack]) from [<c0ac9b58>] (nf_conntrack_destroy+0x94/0x168) [ 50.230160] [<c0ac9b58>] (nf_conntrack_destroy) from [<c0a4aaa0>] (skb_release_head_state+0xa0/0xd0) [ 50.239314] [<c0a4aaa0>] (skb_release_head_state) from [<c0a4aadc>] (skb_release_all+0xc/0x24) [ 50.247946] [<c0a4aadc>] (skb_release_all) from [<c0a4b4cc>] (consume_skb+0x74/0x17c) [ 50.255796] [<c0a4b4cc>] (consume_skb) from [<c081a2dc>] (lan743x_tx_release_desc+0x120/0x124) [ 50.264428] [<c081a2dc>] (lan743x_tx_release_desc) from [<c081a98c>] (lan743x_tx_napi_poll+0x5c/0x18c) [ 50.273755] [<c081a98c>] (lan743x_tx_napi_poll) from [<c0a6b050>] (net_rx_action+0x118/0x4a4) [ 50.282306] [<c0a6b050>] (net_rx_action) from [<c0101364>] (__do_softirq+0x13c/0x53c) [ 50.290157] [<c0101364>] (__do_softirq) from [<c012b29c>] (irq_exit+0x150/0x17c) [ 50.297575] [<c012b29c>] (irq_exit) from [<c0196a08>] (__handle_domain_irq+0x60/0xb0) [ 50.305423] [<c0196a08>] (__handle_domain_irq) from [<c05d44fc>] (gic_handle_irq+0x4c/0x90) [ 50.313790] [<c05d44fc>] (gic_handle_irq) from [<c0100ed4>] (__irq_usr+0x54/0x80) [ 50.321287] Exception stack(0xecd99fb0 to 0xecd99ff8) [ 50.326355] 9fa0: 1cf1aa74 00000001 00000001 00000000 [ 50.334547] 9fc0: 00000001 00000000 00000000 00000000 00000000 00000000 00004097 b6d17d14 [ 50.342738] 9fe0: 00000001 b6d17c60 00000000 b6e71f94 800b0010 ffffffff [ 50.349364] irq event stamp: 2525027 [ 50.352955] hardirqs last enabled at (2525026): [<c0a6afec>] net_rx_action+0xb4/0x4a4 [ 50.360892] hardirqs last disabled at (2525027): [<c0d6d2fc>] _raw_spin_lock_irqsave+0x1c/0x50 [ 50.369517] softirqs last enabled at (2524660): [<c01015b4>] __do_softirq+0x38c/0x53c [ 50.377446] softirqs last disabled at (2524693): [<c012b29c>] irq_exit+0x150/0x17c [ 50.385027] ---[ end trace c0b571db4bc8087d ]--- The driver is calling dev_kfree_skb() from code inside a spinlock, where h/w interrupts are disabled. This is forbidden, as documented in include/linux/netdevice.h. The correct function to use dev_kfree_skb_irq(), or dev_kfree_skb_any(). Fix by using the correct dev_kfree_skb_xxx() functions: in lan743x_tx_release_desc(): called by lan743x_tx_release_completed_descriptors() called by in lan743x_tx_napi_poll() which holds a spinlock called by lan743x_tx_release_all_descriptors() called by lan743x_tx_close() which can-sleep conclusion: use dev_kfree_skb_any() in lan743x_tx_xmit_frame(): which holds a spinlock conclusion: use dev_kfree_skb_irq() in lan743x_tx_close(): which can-sleep conclusion: use dev_kfree_skb() in lan743x_rx_release_ring_element(): called by lan743x_rx_close() which can-sleep called by lan743x_rx_open() which can-sleep conclusion: use dev_kfree_skb() Fixes: 23f0703 ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com> Link: https://lore.kernel.org/r/20201112185949.11315-1-TheSven73@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 796a266 ] On arm imx6, when opening the chip's netdev, the whole Linux kernel intermittently hangs/freezes. This is caused by a bug in the driver code which tests if pcie interrupts are working correctly, using the software interrupt: 1. open: enable the software interrupt 2. open: tell the chip to assert the software interrupt 3. open: wait for flag 4. ISR: acknowledge s/w interrupt, set flag 5. open: notice flag, disable the s/w interrupt, continue Unfortunately the ISR only acknowledges the s/w interrupt, but does not disable it. This will re-trigger the ISR in a tight loop. On some (lucky) platforms, open proceeds to disable the s/w interrupt even while the ISR is 'spinning'. On arm imx6, the spinning ISR does not allow open to proceed, resulting in a hung Linux kernel. Fix minimally by disabling the s/w interrupt in the ISR, which will prevent it from spinning. This won't break anything because the s/w interrupt is used as a one-shot interrupt. Note that this is a minimal fix, overlooking many possible cleanups, e.g.: - lan743x_intr_software_isr() is completely redundant and reads INT_STS twice for no apparent reason - disabling the s/w interrupt in lan743x_intr_test_isr() is now redundant, but harmless - waiting on software_isr_flag can be converted from a sleeping poll loop to wait_event_timeout() Fixes: 23f0703 ("lan743x: Add main source files for new lan743x driver") Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # arm imx6 lan7430 Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com> Link: https://lore.kernel.org/r/20201112204741.12375-1-TheSven73@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1f492ea ] The driver sends Ethernet Management Datagram (EMAD) packets to the device for configuration purposes and waits for up to 200ms for a reply. A request is retried up to 5 times. When the system is under heavy load, replies are not always processed in time and EMAD transactions fail. Make the process more robust to such delays by using exponential backoff. First wait for up to 200ms, then retransmit and wait for up to 400ms and so on. Fixes: caf7297 ("mlxsw: core: Introduce support for asynchronous EMAD register access") Reported-by: Denis Yulevich <denisyu@nvidia.com> Tested-by: Denis Yulevich <denisyu@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7b027c2 ] Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 39a6f4b ("b44: replace the ssb_dma API with the generic DMA API") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1605582131-36735-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7a30ecc ] In br_forward.c and br_input.c fields dev->stats.tx_dropped and dev->stats.multicast are populated, but they are ignored in ndo_get_stats64. Fixes: 2817273 ("net: fix 64 bit counters on 32 bit arches") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/58ea9963-77ad-a7cf-8dfd-fc95ab95f606@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9230706 ] As soon as you add the second port to a VLAN, all other port membership configuration is overwritten with zeroes. The HW interprets this as all ports being "unmodified members" of the VLAN. In the simple case when all ports belong to the same VLAN, switching will still work. But using multiple VLANs or trying to set multiple ports as tagged members will not work. On the 6352, doing a VTU GetNext op, followed by an STU GetNext op will leave you with both the member- and state- data in the VTU/STU data registers. But on the 6097 (which uses the same implementation), the STU GetNext will override the information gathered from the VTU GetNext. Separate the two stages, parsing the result of the VTU GetNext before doing the STU GetNext. We opt to update the existing implementation for all applicable chips, as opposed to creating a separate callback for 6097, because although the previous implementation did work for (at least) 6352, the datasheet does not mention the masking behavior. Fixes: ef6fcea ("net: dsa: mv88e6xxx: get STU entry on VTU GetNext") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Link: https://lore.kernel.org/r/20201112114335.27371-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 35f735c ] Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 83a8471 ("net: ethernet: ti: cpsw: refactor probe to group common hw initialization") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1605250173-18438-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1532b97 ] DSA network devices rely on having their DSA management interface up and running otherwise their ndo_open() will return -ENETDOWN. Without doing this it would not be possible to use DSA devices as netconsole when configured on the command line. These devices also do not utilize the upper/lower linking so the check about the netpoll device having upper is not going to be a problem. The solution adopted here is identical to the one done for net/ipv4/ipconfig.c with 728c020 ("net: ipv4: handle DSA enabled master network devices"), with the network namespace scope being restricted to that of the process configuring netpoll. Fixes: 04ff53f ("net: dsa: Add netconsole support") Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201117035236.22658-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 866358e ] The current NetLabel code doesn't correctly keep track of the netlink dump state in some cases, in particular when multiple interfaces with large configurations are loaded. The problem manifests itself by not reporting the full configuration to userspace, even though it is loaded and active in the kernel. This patch fixes this by ensuring that the dump state is properly reset when necessary inside the netlbl_unlabel_staticlist() function. Fixes: 8cc4457 ("NetLabel: Introduce static network labels for unlabeled connections") Signed-off-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/160484450633.3752.16512718263560813473.stgit@sifl Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1ba86d4 ] Static checking revealed that a previous fix to netlbl_unlabel_staticlist() leaves a stack variable uninitialized, this patches fixes that. Fixes: 866358e ("netlabel: fix our progress tracking in netlbl_unlabel_staticlist()") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/r/160530304068.15651.18355773009751195447.stgit@sifl Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2a1828e ] A user reports (slightly shortened from the original message): libphy: lantiq,xrx200-mdio: probed mdio_bus 1e108000.switch-mii: MDIO device at address 17 is missing. gswip 1e108000.switch lan: no phy at 2 gswip 1e108000.switch lan: failed to connect to port 2: -19 lantiq,xrx200-net 1e10b308.eth eth0: error -19 setting up slave phy This is a single-port board using the internal Fast Ethernet PHY. The user reports that switching to PHY scanning instead of configuring the PHY within device-tree works around this issue. The documentation for the standalone variant of the PHY11G (which is probably very similar to what is used inside the xRX200 SoCs but having the firmware burnt onto that standalone chip in the factory) states that the PHY needs 300ms to be ready for MDIO communication after releasing the reset. Add a 300ms delay after initializing all GPHYs to ensure that the GPHY firmware had enough time to initialize and to appear on the MDIO bus. Unfortunately there is no (known) documentation on what the minimum time to wait after releasing the reset on an internal PHY so play safe and take the one for the external variant. Only wait after the last GPHY firmware is loaded to not slow down the initialization too much ( xRX200 has two GPHYs but newer SoCs have at least three GPHYs). Fixes: 14fceff ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201115165757.552641-1-martin.blumenstingl@googlemail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6d9c8d1 ] Slave function read the following capabilities from the wrong offset: 1. log_mc_entry_sz 2. fs_log_entry_sz 3. log_mc_hash_sz Fix that by adjusting these capabilities offset to match firmware layout. Due to the wrong offset read, the following issues might occur: 1+2. Negative value reported at max_mcast_qp_attach. 3. Driver to init FW with multicast hash size of zero. Fixes: a40ded6 ("net/mlx4_core: Add masking for a few queries on HCA caps") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20201118081922.553-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…eanup [ Upstream commit fc70f5b ] During rmnet unregistration, the real device rx_handler is first cleared followed by the removal of rx_handler_data after the rcu synchronization. Any packets in the receive path may observe that the rx_handler is NULL. However, there is no check when dereferencing this value to use the rmnet_port information. This fixes following splat by adding the NULL check. Unable to handle kernel NULL pointer dereference at virtual address 000000000000000d pc : rmnet_rx_handler+0x124/0x284 lr : rmnet_rx_handler+0x124/0x284 rmnet_rx_handler+0x124/0x284 __netif_receive_skb_core+0x758/0xd74 __netif_receive_skb+0x50/0x17c process_backlog+0x15c/0x1b8 napi_poll+0x88/0x284 net_rx_action+0xbc/0x23c __do_softirq+0x20c/0x48c Fixes: ceed73a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Link: https://lore.kernel.org/r/1605298325-3705-1-git-send-email-subashab@codeaurora.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…id() [ Upstream commit 41a0be3 ] Sparse complaints 3 times about: net/smc/smc_ib.c:203:52: warning: incorrect type in argument 1 (different address spaces) net/smc/smc_ib.c:203:52: expected struct net_device const *dev net/smc/smc_ib.c:203:52: got struct net_device [noderef] __rcu *const ndev Fix that by using the existing and validated ndev variable instead of accessing attr->ndev directly. Fixes: 5102eca ("net/smc: Use rdma_read_gid_l2_fields to L2 fields") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3fe16ed ] If tcp socket has more data than Encrypted Handshake Message then tls_sw_recvmsg will try to decrypt next record instead of returning full control message to userspace as mentioned in comment. The next message - usually Application Data - gets corrupted because it uses zero copy for decryption that's why the data is not stored in skb for next iteration. Revert check to not decrypt next record if current is not Application Data. Fixes: 692d7b5 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/1605413760-21153-1-git-send-email-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4ee18c1 ] The x25_disconnect function in x25_subr.c would decrease the refcount of "x25->neighbour" (struct x25_neigh) and reset this pointer to NULL. However, the x25_rx_call_request function in af_x25.c, which is called when we receive a connection request, does not increase the refcount when it assigns the pointer. Fix this issue by increasing the refcount of "struct x25_neigh" in x25_rx_call_request. This patch fixes frequent kernel crashes when using AF_X25 sockets. Fixes: 4becb7e ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect") Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Link: https://lore.kernel.org/r/20201112103506.5875-1-xie.he.0141@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d8c1901 ] The ethernet driver may allocate skb (and skb->data) via napi_alloc_skb(). This ends up to page_frag_alloc() to allocate skb->data from page_frag_cache->va. During the memory pressure, page_frag_cache->va may be allocated as pfmemalloc page. As a result, the skb->pfmemalloc is always true as skb->data is from page_frag_cache->va. The skb will be dropped if the sock (receiver) does not have SOCK_MEMALLOC. This is expected behaviour under memory pressure. However, once kernel is not under memory pressure any longer (suppose large amount of memory pages are just reclaimed), the page_frag_alloc() may still re-use the prior pfmemalloc page_frag_cache->va to allocate skb->data. As a result, the skb->pfmemalloc is always true unless page_frag_cache->va is re-allocated, even if the kernel is not under memory pressure any longer. Here is how kernel runs into issue. 1. The kernel is under memory pressure and allocation of PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail. Instead, the pfmemalloc page is allocated for page_frag_cache->va. 2: All skb->data from page_frag_cache->va (pfmemalloc) will have skb->pfmemalloc=true. The skb will always be dropped by sock without SOCK_MEMALLOC. This is an expected behaviour. 3. Suppose a large amount of pages are reclaimed and kernel is not under memory pressure any longer. We expect skb->pfmemalloc drop will not happen. 4. Unfortunately, page_frag_alloc() does not proactively re-allocate page_frag_alloc->va and will always re-use the prior pfmemalloc page. The skb->pfmemalloc is always true even kernel is not under memory pressure any longer. Fix this by freeing and re-allocating the page instead of recycling it. References: https://lore.kernel.org/lkml/20201103193239.1807-1-dongli.zhang@oracle.com/ References: https://lore.kernel.org/linux-mm/20201105042140.5253-1-willy@infradead.org/ Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Bert Barbe <bert.barbe@oracle.com> Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com> Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Cc: Manjunath Patil <manjunath.b.patil@oracle.com> Cc: Joe Jin <joe.jin@oracle.com> Cc: SRINIVAS <srinivas.eeda@oracle.com> Fixes: 79930f5 ("net: do not deplete pfmemalloc reserve") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201115201029.11903-1-dongli.zhang@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cb47d16 ] Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 469981b ("qed: Add unaligned and packed packet processing") Fixes: fcb39f6 ("qed: Add mpa buffer descriptors for storing and processing mpa fpdus") Fixes: 1e28eaa ("qed: Add iWARP support for fpdu spanned over more than two tcp packets") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Link: https://lore.kernel.org/r/1605532033-27373-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3beb9be ] Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 3ced0a8 ("qlcnic: Add support to run firmware POST") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1605248186-16013-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 057a10f ] A call trace was found in Hangbin's Codenomicon testing with debug kernel: [ 2615.981988] ODEBUG: free active (active state 0) object type: timer_list hint: sctp_generate_proto_unreach_event+0x0/0x3a0 [sctp] [ 2615.995050] WARNING: CPU: 17 PID: 0 at lib/debugobjects.c:328 debug_print_object+0x199/0x2b0 [ 2616.095934] RIP: 0010:debug_print_object+0x199/0x2b0 [ 2616.191533] Call Trace: [ 2616.194265] <IRQ> [ 2616.202068] debug_check_no_obj_freed+0x25e/0x3f0 [ 2616.207336] slab_free_freelist_hook+0xeb/0x140 [ 2616.220971] kfree+0xd6/0x2c0 [ 2616.224293] rcu_do_batch+0x3bd/0xc70 [ 2616.243096] rcu_core+0x8b9/0xd00 [ 2616.256065] __do_softirq+0x23d/0xacd [ 2616.260166] irq_exit+0x236/0x2a0 [ 2616.263879] smp_apic_timer_interrupt+0x18d/0x620 [ 2616.269138] apic_timer_interrupt+0xf/0x20 [ 2616.273711] </IRQ> This is because it holds asoc when transport->proto_unreach_timer starts and puts asoc when the timer stops, and without holding transport the transport could be freed when the timer is still running. So fix it by holding/putting transport instead for proto_unreach_timer in transport, just like other timers in transport. v1->v2: - Also use sctp_transport_put() for the "out_unlock:" path in sctp_generate_proto_unreach_event(), as Marcelo noticed. Fixes: 50b5d6a ("sctp: Fix a race between ICMP protocol unreachable and connect()") Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/102788809b554958b13b95d33440f5448113b8d6.1605331373.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1b9e2a8 ] During loss recovery, retransmitted packets are forced to use TCP timestamps to calculate the RTT samples, which have a millisecond granularity. BBR is designed using a microsecond granularity. As a result, multiple RTT samples could be truncated to the same RTT value during loss recovery. This is problematic, as BBR will not enter PROBE_RTT if the RTT sample is <= the current min_rtt sample, meaning that if there are persistent losses, PROBE_RTT will constantly be pushed off and potentially never re-entered. This patch makes sure that BBR enters PROBE_RTT by checking if RTT sample is < the current min_rtt sample, rather than <=. The Netflix transport/TCP team discovered this bug in the Linux TCP BBR code during lab tests. Fixes: 0f8782e ("tcp_bbr: add BBR congestion control") Signed-off-by: Ryan Sharpelletti <sharpelletti@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Link: https://lore.kernel.org/r/20201116174412.1433277-1-sharpelletti.kdev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8cbcc5e ] Handle destruction of rules with port destination type to enable full destruction of flow. Without this handling of TX rules the deletion of these rules fails. Dmesg of flow destruction failure: [ 203.714146] mlx5_core 0000:00:0b.0: mlx5_cmd_check:753:(pid 342): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x144b7a) [ 210.547387] ------------[ cut here ]------------ [ 210.548663] refcount_t: decrement hit 0; leaking memory. [ 210.550651] WARNING: CPU: 4 PID: 342 at lib/refcount.c:31 refcount_warn_saturate+0x5c/0x110 [ 210.550654] Modules linked in: mlx5_ib mlx5_core ib_ipoib rdma_ucm rdma_cm iw_cm ib_cm ib_umad ib_uverbs ib_core [ 210.550675] CPU: 4 PID: 342 Comm: test Not tainted 5.8.0-rc2+ Freescale#116 [ 210.550678] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 210.550680] RIP: 0010:refcount_warn_saturate+0x5c/0x110 [ 210.550685] Code: c6 d1 1b 01 00 0f 84 ad 00 00 00 5b 5d c3 80 3d b5 d1 1b 01 00 75 f4 48 c7 c7 20 d1 15 82 c6 05 a5 d1 1b 01 01 e8 a7 eb af ff <0f> 0b eb dd 80 3d 99 d1 1b 01 00 75 d4 48 c7 c7 c0 cf 15 82 c6 05 [ 210.550687] RSP: 0018:ffff8881642e77e8 EFLAGS: 00010282 [ 210.550691] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000 [ 210.550694] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102c85ceef [ 210.550696] RBP: ffff888161720428 R08: ffffffff8124c10e R09: ffffed103243beae [ 210.550698] R10: ffff8881921df56b R11: ffffed103243bead R12: ffff8881841b4180 [ 210.550701] R13: ffff888161720428 R14: ffff8881616d0000 R15: ffff888161720380 [ 210.550704] FS: 00007fc27f025740(0000) GS:ffff888192000000(0000) knlGS:0000000000000000 [ 210.550706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 210.550708] CR2: 0000557e4b41a6a0 CR3: 0000000002415004 CR4: 0000000000360ea0 [ 210.550711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 210.550713] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 210.550715] Call Trace: [ 210.550717] mlx5_del_flow_rules+0x484/0x490 [mlx5_core] [ 210.550720] ? mlx5_cmd_set_fte+0xa80/0xa80 [mlx5_core] [ 210.550722] mlx5_ib_destroy_flow+0x17f/0x280 [mlx5_ib] [ 210.550724] uverbs_free_flow+0x4c/0x90 [ib_uverbs] [ 210.550726] destroy_hw_idr_uobject+0x41/0xb0 [ib_uverbs] [ 210.550728] uverbs_destroy_uobject+0xaa/0x390 [ib_uverbs] [ 210.550731] __uverbs_cleanup_ufile+0x129/0x1b0 [ib_uverbs] [ 210.550733] ? uverbs_destroy_uobject+0x390/0x390 [ib_uverbs] [ 210.550735] uverbs_destroy_ufile_hw+0x78/0x190 [ib_uverbs] [ 210.550737] ib_uverbs_close+0x36/0x140 [ib_uverbs] [ 210.550739] __fput+0x181/0x380 [ 210.550741] task_work_run+0x88/0xd0 [ 210.550743] do_exit+0x5f6/0x13b0 [ 210.550745] ? sched_clock_cpu+0x30/0x140 [ 210.550747] ? is_current_pgrp_orphaned+0x70/0x70 [ 210.550750] ? lock_downgrade+0x360/0x360 [ 210.550752] ? mark_held_locks+0x1d/0x90 [ 210.550754] do_group_exit+0x8a/0x140 [ 210.550756] get_signal+0x20a/0xf50 [ 210.550758] do_signal+0x8c/0xbe0 [ 210.550760] ? hrtimer_nanosleep+0x1d8/0x200 [ 210.550762] ? nanosleep_copyout+0x50/0x50 [ 210.550764] ? restore_sigcontext+0x320/0x320 [ 210.550766] ? __hrtimer_init+0xf0/0xf0 [ 210.550768] ? timespec64_add_safe+0x150/0x150 [ 210.550770] ? mark_held_locks+0x1d/0x90 [ 210.550772] ? lockdep_hardirqs_on_prepare+0x14c/0x240 [ 210.550774] __prepare_exit_to_usermode+0x119/0x170 [ 210.550776] do_syscall_64+0x65/0x300 [ 210.550778] ? trace_hardirqs_off+0x10/0x120 [ 210.550781] ? mark_held_locks+0x1d/0x90 [ 210.550783] ? asm_sysvec_apic_timer_interrupt+0xa/0x20 [ 210.550785] ? lockdep_hardirqs_on+0x112/0x190 [ 210.550787] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 210.550789] RIP: 0033:0x7fc27f1cd157 [ 210.550791] Code: Bad RIP value. [ 210.550793] RSP: 002b:00007ffd4db27ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000023 [ 210.550798] RAX: fffffffffffffdfc RBX: ffffffffffffff80 RCX: 00007fc27f1cd157 [ 210.550800] RDX: 00007fc27f025740 RSI: 00007ffd4db27eb0 RDI: 00007ffd4db27eb0 [ 210.550803] RBP: 0000000000000016 R08: 0000000000000000 R09: 000000000000000e [ 210.550805] R10: 00007ffd4db27dc7 R11: 0000000000000246 R12: 0000000000400c00 [ 210.550808] R13: 00007ffd4db285f0 R14: 0000000000000000 R15: 0000000000000000 [ 210.550809] irq event stamp: 49399 [ 210.550812] hardirqs last enabled at (49399): [<ffffffff81172d36>] console_unlock+0x556/0x6f0 [ 210.550815] hardirqs last disabled at (49398): [<ffffffff81172897>] console_unlock+0xb7/0x6f0 [ 210.550818] softirqs last enabled at (48706): [<ffffffff81e0037b>] __do_softirq+0x37b/0x60c [ 210.550820] softirqs last disabled at (48697): [<ffffffff81c00e2f>] asm_call_on_stack+0xf/0x20 [ 210.550822] ---[ end trace ad18c0e6fa846454 ]--- [ 210.581862] mlx5_core 0000:00:0c.0: mlx5_destroy_flow_table:2132:(pid 342): Flow table 262150 wasn't destroyed, refcount > 1 Fixes: a7ee18b ("RDMA/mlx5: Allow creating a matcher for a NIC TX flow table") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 470b747 ] Currently when QoS is enabled for VF and any min_rate is configured, the driver sets bw_share value to at least 1 and doesn’t allow to set it to 0 to make minimal rate unlimited. It means there is always a minimal rate configured for every VF, even if user tries to remove it. In order to make QoS disable possible, check whether all vports have configured min_rate = 0. If this is true, set their bw_share to 0 to disable min_rate limitations. Fixes: c9497c9 ("net/mlx5: Add support for setting VF min rate") Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e844cc upstream. SPI driver probing currently comprises two steps, whereas removal comprises only one step: spi_alloc_master() spi_register_controller() spi_unregister_controller() That's because spi_unregister_controller() calls device_unregister() instead of device_del(), thereby releasing the reference on the spi_controller which was obtained by spi_alloc_master(). An SPI driver's private data is contained in the same memory allocation as the spi_controller struct. Thus, once spi_unregister_controller() has been called, the private data is inaccessible. But some drivers need to access it after spi_unregister_controller() to perform further teardown steps. Introduce devm_spi_alloc_master() and devm_spi_alloc_slave(), which release a reference on the spi_controller struct only after the driver has unbound, thereby keeping the memory allocation accessible. Change spi_unregister_controller() to not release a reference if the spi_controller was allocated by one of these new devm functions. The present commit is small enough to be backportable to stable. It allows fixing drivers which use the private data in their ->remove() hook after it's been freed. It also allows fixing drivers which neglect to release a reference on the spi_controller in the probe error path. Long-term, most SPI drivers shall be moved over to the devm functions introduced herein. The few that can't shall be changed in a treewide commit to explicitly release the last reference on the controller. That commit shall amend spi_unregister_controller() to no longer release a reference, thereby completing the migration. As a result, the behaviour will be less surprising and more consistent with subsystems such as IIO, which also includes the private data in the allocation of the generic iio_dev struct, but calls device_del() in iio_device_unregister(). Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: https://lore.kernel.org/r/272bae2ef08abd21388c98e23729886663d19192.1605121038.git.lukas@wunner.de Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 04a9cd5 upstream. If the calls to of_match_device(), of_alias_get_id(), devm_ioremap_resource(), devm_regmap_init_mmio() or devm_clk_get() fail on probe of the NPCM FIU SPI driver, the spi_controller struct is erroneously not freed. Fix by switching over to the new devm_spi_alloc_master() helper. Fixes: ace55c4 ("spi: npcm-fiu: add NPCM FIU controller driver") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: <stable@vger.kernel.org> # v5.4+: 5e844cc: spi: Introduce device-managed SPI controller allocation Cc: <stable@vger.kernel.org> # v5.4+ Cc: Tomer Maimon <tmaimon77@gmail.com> Link: https://lore.kernel.org/r/a420c23a363a3bc9aa684c6e790c32a8af106d17.1605512876.git.lukas@wunner.de Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e13ee6c upstream. bcm2835aux_spi_remove() accesses the driver's private data after calling spi_unregister_master() even though that function releases the last reference on the spi_master and thereby frees the private data. Fix by switching over to the new devm_spi_alloc_master() helper which keeps the private data accessible until the driver has unbound. Fixes: b9dd3f6 ("spi: bcm2835aux: Fix controller unregister order") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: <stable@vger.kernel.org> # v4.4+: 123456789abc: spi: Introduce device-managed SPI controller allocation Cc: <stable@vger.kernel.org> # v4.4+: b9dd3f6: spi: bcm2835aux: Fix controller unregister order Cc: <stable@vger.kernel.org> # v4.4+ Link: https://lore.kernel.org/r/b290b06357d0c0bdee9cecc539b840a90630f101.1605121038.git.lukas@wunner.de Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 365ec8b upstream. Limit the fsl,pfuze-support-disable-sw to the pfuze100 and pfuze200 variants. When enabling fsl,pfuze-support-disable-sw and using a pfuze3000 or pfuze3001, the driver would choose pfuze100_sw_disable_regulator_ops instead of the newly introduced and correct pfuze3000_sw_regulator_ops. Signed-off-by: Sean Nyekjaer <sean@geanix.com> Fixes: 6f1cf52 ("regualtor: pfuze100: correct sw1a/sw2 on pfuze3000") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201110174113.2066534-1-sean@geanix.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57a6ad4 upstream. Fixed commit introduced a possible second call to set_machine_constraints() and that allocates memory for rdev->constraints. Move the allocation to the caller so it's easier to manage and done once. Fixes: aea6cb9 ("regulator: resolve supply after creating regulator") Cc: stable@vger.kernel.org Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # stpmic1 Link: https://lore.kernel.org/r/78c3d4016cebc08d441aad18cb924b4e4d9cf9df.1605226675.git.mirq-linux@rere.qmqm.pl Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b639e2 upstream. When a regulator's name equals its supply's name the regulator_resolve_supply() recurses indefinitely. Add a check so that debugging the problem is easier. The "fixed" commit just exposed the problem. Fixes: aea6cb9 ("regulator: resolve supply after creating regulator") Cc: stable@vger.kernel.org Reported-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # stpmic1 Link: https://lore.kernel.org/r/c6171057cfc0896f950c4d8cb82df0f9f1b89ad9.1605226675.git.mirq-linux@rere.qmqm.pl Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f5c042b upstream. Workaround regulators whose supply name happens to be the same as its own name. This fixes boards that used to work before the early supply resolving was removed. The error message is left in place so that offending drivers can be detected. Fixes: aea6cb9 ("regulator: resolve supply after creating regulator") Cc: stable@vger.kernel.org Reported-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # stpmic1 Link: https://lore.kernel.org/r/d703acde2a93100c3c7a81059d716c50ad1b1f52.1605226675.git.mirq-linux@rere.qmqm.pl Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 481535c upstream. fast_second_level_miss handler for the TLBTEMP area has an assumption that page table directory entry for the TLBTEMP address range is 0. For it to be true the TLBTEMP area must be aligned to 4MB boundary and not share its 4MB region with anything that may use a page table. This is not true currently: TLBTEMP shares space with vmalloc space which results in the following kinds of runtime errors when fast_second_level_miss loads page table directory entry for the vmalloc space instead of fixing up the TLBTEMP area: Unable to handle kernel paging request at virtual address c7ff0e00 pc = d0009275, ra = 90009478 Oops: sig: 9 [Freescale#1] PREEMPT CPU: 1 PID: 61 Comm: kworker/u9:2 Not tainted 5.10.0-rc3-next-20201110-00007-g1fe4962fa983-dirty Freescale#58 Workqueue: xprtiod xs_stream_data_receive_workfn a00: 90009478 d11e1dc0 c7ff0e00 00000020 c7ff0000 00000001 7f8b8107 00000000 a08: 900c5992 d11e1d9 d0cc88b8 5506e97c 00000000 5506e97c d06c8074 d11e1d9 pc: d0009275, ps: 00060310, depc: 00000014, excvaddr: c7ff0e00 lbeg: d0009275, lend: d0009287 lcount: 00000003, sar: 00000010 Call Trace: xs_stream_data_receive_workfn+0x43c/0x770 process_one_work+0x1a1/0x324 worker_thread+0x1cc/0x3c0 kthread+0x10d/0x124 ret_from_kernel_thread+0xc/0x18 Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3a860d1 upstream. Although cache alias management calls set up and tear down TLB entries and fast_second_level_miss is able to restore TLB entry should it be evicted they absolutely cannot preempt each other because they use the same TLBTEMP area for different purposes. Disable preemption around all cache alias management calls to enforce that. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4fe40b8 upstream. Deferring sampling attempts to the second stage has some bad interactions with drivers that process the rate table in hardware and use the probe flag to indicate probing packets (e.g. most mt76 drivers). On affected drivers it can lead to probing not working at all. If the link conditions turn worse, it might not be such a good idea to do a lot of sampling for lower rates in this case. Fix this by simply skipping the sample attempt instead of deferring it, but keep the checks that would allow it to be sampled if it was skipped too often, but only if it has less than 95% success probability. Also ensure that IEEE80211_TX_CTL_RATE_CTRL_PROBE is set for all probing packets. Cc: stable@vger.kernel.org Fixes: cccf129 ("mac80211: add the 'minstrel' rate control algorithm") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20201111183359.43528-2-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2911a8 upstream. Some drivers fill the status rate list without setting the rate index after the final rate to -1. minstrel_ht already deals with this, but minstrel doesn't, which causes it to get stuck at the lowest rate on these drivers. Fix this by checking the count as well. Cc: stable@vger.kernel.org Fixes: cccf129 ("mac80211: add the 'minstrel' rate control algorithm") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20201111183359.43528-3-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7bc40ae upstream. If sta_info_insert_finish() fails, we currently keep the station around and free it only in the caller, but there's only one such caller and it always frees it immediately. As syzbot found, another consequence of this split is that we can put things that sleep only into __cleanup_single_sta() and not in sta_info_free(), but this is the only place that requires such of sta_info_free() now. Change this to free the station in sta_info_insert_finish(), in which case we can still sleep. This will also let us unify the cleanup code later. Cc: stable@vger.kernel.org Fixes: dcd479e ("mac80211: always wind down STA state") Reported-by: syzbot+32c6c38c4812d22f2f0b@syzkaller.appspotmail.com Reported-by: syzbot+4c81fe92e372d26c4246@syzkaller.appspotmail.com Reported-by: syzbot+6a7fe9faf0d1d61bc24a@syzkaller.appspotmail.com Reported-by: syzbot+abed06851c5ffe010921@syzkaller.appspotmail.com Reported-by: syzbot+b7aeb9318541a1c709f1@syzkaller.appspotmail.com Reported-by: syzbot+d5a9416c6cafe53b5dd0@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20201112112201.ee6b397b9453.I9c31d667a0ea2151441cc64ed6613d36c18a48e0@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 78d732e upstream. This file is installed by the s390 CPU Measurement sampling facility device driver to export supported minimum and maximum sample buffer sizes. This file is read by lscpumf tool to display the details of the device driver capabilities. The lscpumf tool might be invoked by a non-root user. In this case it does not print anything because the file contents can not be read. Fix this by allowing read access for all users. Reading the file contents is ok, changing the file contents is left to the root user only. For further reference and details see: [1] ibm-s390-linux/s390-tools#97 Fixes: 69f239e ("s390/cpum_sf: Dynamically extend the sampling buffer if overflows occur") Cc: <stable@vger.kernel.org> # 3.14 Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f117cb upstream. When requeueing all requests on the device request queue to the blocklayer we might get to an ERP (error recovery) request that is a copy of an original CQR. Those requests do not have blocklayer request information or a pointer to the dasd_queue set. When trying to access those data it will lead to a null pointer dereference in dasd_requeue_all_requests(). Fix by checking if the request is an ERP request that can simply be ignored. The blocklayer request will be requeued by the original CQR that is on the device queue right behind the ERP request. Fixes: 9487cfd ("s390/dasd: fix handling of internal requests") Cc: <stable@vger.kernel.org> Freescale#4.16 Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92e4dc8 upstream. When invoking kexec() on a Linux guest running on a Hyper-V host, the kernel panics. RIP: 0010:cpuhp_issue_call+0x137/0x140 Call Trace: __cpuhp_remove_state_cpuslocked+0x99/0x100 __cpuhp_remove_state+0x1c/0x30 hv_kexec_handler+0x23/0x30 [hv_vmbus] hv_machine_shutdown+0x1e/0x30 machine_shutdown+0x10/0x20 kernel_kexec+0x6d/0x96 __do_sys_reboot+0x1ef/0x230 __x64_sys_reboot+0x1d/0x20 do_syscall_64+0x6b/0x3d8 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This was due to hv_synic_cleanup() callback returning -EBUSY to cpuhp_issue_call() when tearing down the VMBUS_CONNECT_CPU, even if the vmbus_connection.conn_state = DISCONNECTED. hv_synic_cleanup() should succeed in the case where vmbus_connection.conn_state is DISCONNECTED. Fix is to add an extra condition to test for vmbus_connection.conn_state == CONNECTED on the VMBUS_CONNECT_CPU and only return early if true. This way the kexec() path can still shut everything down while preserving the initial behavior of preventing CPU offlining on the VMBUS_CONNECT_CPU while the VM is running. Fixes: 8a857c5 ("Drivers: hv: vmbus: Always handle the VMBus messages on CPU0") Signed-off-by: Chris Co <chrco@microsoft.com> Reviewed-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201110190118.15596-1-chrco@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 728321e upstream. If we have more than 4 displays we will run into dummy irq calls or flip timout issues. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d2e3fce upstream. EDID can declare the maximum supported bpc up to 16, and apparently there are displays that do so. Currently we assume 12 bpc is tha max. Fix the assumption and toss in a MISSING_CASE() for any other value we don't expect to see. This fixes modesets with a display with EDID max bpc > 12. Previously any modeset would just silently fail on platforms that didn't otherwise limit this via the max_bpc property. In particular we don't add the max_bpc property to HDMI ports on gmch platforms, and thus we would see the raw max_bpc coming from the EDID. I suppose we could already adjust this to also allow 16bpc, but seeing as no current platform supports that there is little point. Cc: stable@vger.kernel.org Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2632 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201110210447.27454-1-ville.syrjala@linux.intel.com Reviewed-by: José Roberto de Souza <jose.souza@intel.com> (cherry picked from commit 2ca5a7b) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
… Intel controllers commit 60d5356 upstream. A UHS setting of SDR25 can give better results for High Speed mode. This is because there is no setting corresponding to high speed. Currently SDHCI sets no value, which means zero which is also the setting for SDR12. There was an attempt to change this in sdhci.c but it caused problems for some drivers, so it was reverted and the change was made to sdhci-brcmstb in commit 2fefc7c ("mmc: sdhci-brcmstb: Fix incorrect switch to HS mode"). Several other drivers also do this. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org # v5.4+ Link: https://lore.kernel.org/r/20201112133656.20317-1-adrian.hunter@intel.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cf23705 upstream. Commit 69f594a ("ptrace: do not audit capability check when outputing /proc/pid/stat") replaced the use of ns_capable() with has_ns_capability{,_noaudit}() which doesn't set PF_SUPERPRIV. Commit 6b3ad66 ("ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()") replaced has_ns_capability{,_noaudit}() with security_capable(), which doesn't set PF_SUPERPRIV neither. Since commit 98f368e ("kernel: Add noaudit variant of ns_capable()"), a new ns_capable_noaudit() helper is available. Let's use it! As a result, the signature of ptrace_has_cap() is restored to its original one. Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Eric Paris <eparis@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Tyler Hicks <tyhicks@linux.microsoft.com> Cc: stable@vger.kernel.org Fixes: 6b3ad66 ("ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()") Fixes: 69f594a ("ptrace: do not audit capability check when outputing /proc/pid/stat") Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201030123849.770769-2-mic@digikod.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb14528 upstream. Replace the use of security_capable(current_cred(), ...) with ns_capable_noaudit() which set PF_SUPERPRIV. Since commit 98f368e ("kernel: Add noaudit variant of ns_capable()"), a new ns_capable_noaudit() helper is available. Let's use it! Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tyler Hicks <tyhicks@linux.microsoft.com> Cc: Will Drewry <wad@chromium.org> Cc: stable@vger.kernel.org Fixes: e2cfabd ("seccomp: add system call filtering using BPF") Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201030123849.770769-3-mic@digikod.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…r early loading commit 1a371e6 upstream. Currently, scan_microcode() leverages microcode_matches() to check if the microcode matches the CPU by comparing the family and model. However, the processor stepping and flags of the microcode signature should also be considered when saving a microcode patch for early update. Use find_matching_signature() in scan_microcode() and get rid of the now-unused microcode_matches() which is a good cleanup in itself. Complete the verification of the patch being saved for early loading in save_microcode_patch() directly. This needs to be done there too because save_mc_for_early() will call save_microcode_patch() too. The second reason why this needs to be done is because the loader still tries to support, at least hypothetically, mixed-steppings systems and thus adds all patches to the cache that belong to the same CPU model albeit with different steppings. For example: microcode: CPU: sig=0x906ec, pf=0x2, rev=0xd6 microcode: mc_saved[0]: sig=0x906e9, pf=0x2a, rev=0xd6, total size=0x19400, date = 2020-04-23 microcode: mc_saved[1]: sig=0x906ea, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27 microcode: mc_saved[2]: sig=0x906eb, pf=0x2, rev=0xd6, total size=0x19400, date = 2020-04-23 microcode: mc_saved[3]: sig=0x906ec, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27 microcode: mc_saved[4]: sig=0x906ed, pf=0x22, rev=0xd6, total size=0x19400, date = 2020-04-23 The patch which is being saved for early loading, however, can only be the one which fits the CPU this runs on so do the signature verification before saving. [ bp: Do signature verification in save_microcode_patch() and rewrite commit message. ] Fixes: ec400dd ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU") Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=208535 Link: https://lkml.kernel.org/r/20201113015923.13960-1-yu.c.chen@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8faeb1f upstream. If we reparent the slab objects to the root memcg, when we free the slab object, we need to update the per-memcg vmstats to keep it correct for the root memcg. Now this at least affects the vmstat of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack size is smaller than the PAGE_SIZE. David said: "I assume that without this fix that the root memcg's vmstat would always be inflated if we reparented" Fixes: ec9f023 ("mm: workingset: fix vmstat counters for shadow nodes") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Christopher Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Roman Gushchin <guro@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Chris Down <chris@chrisdown.name> Cc: <stable@vger.kernel.org> [5.3+] Link: https://lkml.kernel.org/r/20201110031015.15715-1-songmuchun@bytedance.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…lt() commit bfe8cc1 upstream. Alexander reported a syzkaller / KASAN finding on s390, see below for complete output. In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be freed in some cases. In the case of userfaultfd_missing(), this will happen after calling handle_userfault(), which might have released the mmap_lock. Therefore, the following pte_free(vma->vm_mm, pgtable) will access an unstable vma->vm_mm, which could have been freed or re-used already. For all architectures other than s390 this will go w/o any negative impact, because pte_free() simply frees the page and ignores the passed-in mm. The implementation for SPARC32 would also access mm->page_table_lock for pte_free(), but there is no THP support in SPARC32, so the buggy code path will not be used there. For s390, the mm->context.pgtable_list is being used to maintain the 2K pagetable fragments, and operating on an already freed or even re-used mm could result in various more or less subtle bugs due to list / pagetable corruption. Fix this by calling pte_free() before handle_userfault(), similar to how it is already done in __do_huge_pmd_anonymous_page() for the WRITE / non-huge_zero_page case. Commit 6b251fc ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults") actually introduced both, the do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page() changes wrt to calling handle_userfault(), but only in the latter case it put the pte_free() before calling handle_userfault(). BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744 Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334 CPU: 1 PID: 9334 Comm: syz-executor.0 Not tainted 5.10.0-rc1-syzkaller-07083-g4c9720875573 #0 Hardware name: IBM 3906 M04 701 (KVM/Linux) Call Trace: do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744 create_huge_pmd mm/memory.c:4256 [inline] __handle_mm_fault+0xe6e/0x1068 mm/memory.c:4480 handle_mm_fault+0x288/0x748 mm/memory.c:4607 do_exception+0x394/0xae0 arch/s390/mm/fault.c:479 do_dat_exception+0x34/0x80 arch/s390/mm/fault.c:567 pgm_check_handler+0x1da/0x22c arch/s390/kernel/entry.S:706 copy_from_user_mvcos arch/s390/lib/uaccess.c:111 [inline] raw_copy_from_user+0x3a/0x88 arch/s390/lib/uaccess.c:174 _copy_from_user+0x48/0xa8 lib/usercopy.c:16 copy_from_user include/linux/uaccess.h:192 [inline] __do_sys_sigaltstack kernel/signal.c:4064 [inline] __s390x_sys_sigaltstack+0xc8/0x240 kernel/signal.c:4060 system_call+0xe0/0x28c arch/s390/kernel/entry.S:415 Allocated by task 9334: slab_alloc_node mm/slub.c:2891 [inline] slab_alloc mm/slub.c:2899 [inline] kmem_cache_alloc+0x118/0x348 mm/slub.c:2904 vm_area_dup+0x9c/0x2b8 kernel/fork.c:356 __split_vma+0xba/0x560 mm/mmap.c:2742 split_vma+0xca/0x108 mm/mmap.c:2800 mlock_fixup+0x4ae/0x600 mm/mlock.c:550 apply_vma_lock_flags+0x2c6/0x398 mm/mlock.c:619 do_mlock+0x1aa/0x718 mm/mlock.c:711 __do_sys_mlock2 mm/mlock.c:738 [inline] __s390x_sys_mlock2+0x86/0xa8 mm/mlock.c:728 system_call+0xe0/0x28c arch/s390/kernel/entry.S:415 Freed by task 9333: slab_free mm/slub.c:3142 [inline] kmem_cache_free+0x7c/0x4b8 mm/slub.c:3158 __vma_adjust+0x7b2/0x2508 mm/mmap.c:960 vma_merge+0x87e/0xce0 mm/mmap.c:1209 userfaultfd_release+0x412/0x6b8 fs/userfaultfd.c:868 __fput+0x22c/0x7a8 fs/file_table.c:281 task_work_run+0x200/0x320 kernel/task_work.c:151 tracehook_notify_resume include/linux/tracehook.h:188 [inline] do_notify_resume+0x100/0x148 arch/s390/kernel/signal.c:538 system_call+0xe6/0x28c arch/s390/kernel/entry.S:416 The buggy address belongs to the object at 00000000962d6948 which belongs to the cache vm_area_struct of size 200 The buggy address is located 64 bytes inside of 200-byte region [00000000962d6948, 00000000962d6a10) The buggy address belongs to the page: page:00000000313a09fe refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x962d6 flags: 0x3ffff00000000200(slab) raw: 3ffff00000000200 000040000257e080 0000000c0000000c 000000008020ba00 raw: 0000000000000000 000f001e00000000 ffffffff00000001 0000000096959501 page dumped because: kasan: bad access detected page->mem_cgroup:0000000096959501 Memory state around the buggy address: 00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb >00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ 00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Fixes: 6b251fc ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults") Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: <stable@vger.kernel.org> [4.3+] Link: https://lkml.kernel.org/r/20201110190329.11920-1-gerald.schaefer@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f80b08f upstream. When boosting is enabled, it is observed that rate of atomic order-0 allocation failures are high due to the fact that free levels in the system are checked with ->watermark_boost offset. This is not a problem for sleepable allocations but for atomic allocations which looks like regression. This problem is seen frequently on system setup of Android kernel running on Snapdragon hardware with 4GB RAM size. When no extfrag event occurred in the system, ->watermark_boost factor is zero, thus the watermark configurations in the system are: _watermark = ( [WMARK_MIN] = 1272, --> ~5MB [WMARK_LOW] = 9067, --> ~36MB [WMARK_HIGH] = 9385), --> ~38MB watermark_boost = 0 After launching some memory hungry applications in Android which can cause extfrag events in the system to an extent that ->watermark_boost can be set to max i.e. default boost factor makes it to 150% of high watermark. _watermark = ( [WMARK_MIN] = 1272, --> ~5MB [WMARK_LOW] = 9067, --> ~36MB [WMARK_HIGH] = 9385), --> ~38MB watermark_boost = 14077, -->~57MB With default system configuration, for an atomic order-0 allocation to succeed, having free memory of ~2MB will suffice. But boosting makes the min_wmark to ~61MB thus for an atomic order-0 allocation to be successful system should have minimum of ~23MB of free memory(from calculations of zone_watermark_ok(), min = 3/4(min/2)). But failures are observed despite system is having ~20MB of free memory. In the testing, this is reproducible as early as first 300secs since boot and with furtherlowram configurations(<2GB) it is observed as early as first 150secs since boot. These failures can be avoided by excluding the ->watermark_boost in watermark caluculations for atomic order-0 allocations. [akpm@linux-foundation.org: fix comment grammar, reflow comment] [charante@codeaurora.org: fix suggested by Mel Gorman] Link: http://lkml.kernel.org/r/31556793-57b1-1c21-1a9d-22674d9bd938@codeaurora.org Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Vinayak Menon <vinmenon@codeaurora.org> Cc: Mel Gorman <mgorman@techsingularity.net> Link: http://lkml.kernel.org/r/1589882284-21010-1-git-send-email-charante@codeaurora.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ralph Siemsen <ralph.siemsen@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8e1ac42 upstream. enqueue_task_fair() attempts to skip the overutilized update for new tasks as their util_avg is not accurate yet. However, the flag we check to do so is overwritten earlier on in the function, which makes the condition pretty much a nop. Fix this by saving the flag early on. Fixes: 2802bf3 ("sched/fair: Add over-utilization/tipping point indicator") Reported-by: Rick Yiu <rickyiu@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20201112111201.2081902-1-qperret@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/r/20201123121819.943135899@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is the 5.4.80 stable release Conflicts (manual resolve): - arch/arm64/boot/dts/freescale/imx8mn.dtsi: Fix minor merge conflict where commit [8381af1b684c] in stable tree removed one blank line. - drivers/net/can/flexcan.c: Fix merge fuzz during integration of stable commit [4c0a778fcf7b5]. Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
zandrey
pushed a commit
to zandrey/linux-fslc
that referenced
this pull request
May 14, 2021
[ Upstream commit 0f20615 ] Fix BPF_CORE_READ_BITFIELD() macro used for reading CO-RE-relocatable bitfields. Missing breaks in a switch caused 8-byte reads always. This can confuse libbpf because it does strict checks that memory load size corresponds to the original size of the field, which in this case quite often would be wrong. After fixing that, we run into another problem, which quite subtle, so worth documenting here. The issue is in Clang optimization and CO-RE relocation interactions. Without that asm volatile construct (also known as barrier_var()), Clang will re-order BYTE_OFFSET and BYTE_SIZE relocations and will apply BYTE_OFFSET 4 times for each switch case arm. This will result in the same error from libbpf about mismatch of memory load size and original field size. I.e., if we were reading u32, we'd still have *(u8 *), *(u16 *), *(u32 *), and *(u64 *) memory loads, three of which will fail. Using barrier_var() forces Clang to apply BYTE_OFFSET relocation first (and once) to calculate p, after which value of p is used without relocation in each of switch case arms, doing appropiately-sized memory load. Here's the list of relevant relocations and pieces of generated BPF code before and after this patch for test_core_reloc_bitfields_direct selftests. BEFORE ===== Freescale#45: core_reloc: insn Freescale#160 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32 Freescale#46: core_reloc: insn Freescale#167 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 Freescale#47: core_reloc: insn Freescale#174 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 Freescale#48: core_reloc: insn Freescale#178 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 Freescale#49: core_reloc: insn Freescale#182 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 157: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll 159: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1 160: b7 02 00 00 04 00 00 00 r2 = 4 ; BYTE_SIZE relocation here ^^^ 161: 66 02 07 00 03 00 00 00 if w2 s> 3 goto +7 <LBB0_63> 162: 16 02 0d 00 01 00 00 00 if w2 == 1 goto +13 <LBB0_65> 163: 16 02 01 00 02 00 00 00 if w2 == 2 goto +1 <LBB0_66> 164: 05 00 12 00 00 00 00 00 goto +18 <LBB0_69> 0000000000000528 <LBB0_66>: 165: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 167: 69 11 08 00 00 00 00 00 r1 = *(u16 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 168: 05 00 0e 00 00 00 00 00 goto +14 <LBB0_69> 0000000000000548 <LBB0_63>: 169: 16 02 0a 00 04 00 00 00 if w2 == 4 goto +10 <LBB0_67> 170: 16 02 01 00 08 00 00 00 if w2 == 8 goto +1 <LBB0_68> 171: 05 00 0b 00 00 00 00 00 goto +11 <LBB0_69> 0000000000000560 <LBB0_68>: 172: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 174: 79 11 08 00 00 00 00 00 r1 = *(u64 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 175: 05 00 07 00 00 00 00 00 goto +7 <LBB0_69> 0000000000000580 <LBB0_65>: 176: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 178: 71 11 08 00 00 00 00 00 r1 = *(u8 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 179: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69> 00000000000005a0 <LBB0_67>: 180: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 182: 61 11 08 00 00 00 00 00 r1 = *(u32 *)(r1 + 8) ; BYTE_OFFSET relo here w/ RIGHT size ^^^^^^^^^^^^^^^^ 00000000000005b8 <LBB0_69>: 183: 67 01 00 00 20 00 00 00 r1 <<= 32 184: b7 02 00 00 00 00 00 00 r2 = 0 185: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71> 186: c7 01 00 00 20 00 00 00 r1 s>>= 32 187: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72> 00000000000005e0 <LBB0_71>: 188: 77 01 00 00 20 00 00 00 r1 >>= 32 AFTER ===== Freescale#30: core_reloc: insn Freescale#132 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 Freescale#31: core_reloc: insn Freescale#134 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32 129: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll 131: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1 132: b7 01 00 00 08 00 00 00 r1 = 8 ; BYTE_OFFSET relo here ^^^ ; no size check for non-memory dereferencing instructions 133: 0f 12 00 00 00 00 00 00 r2 += r1 134: b7 03 00 00 04 00 00 00 r3 = 4 ; BYTE_SIZE relocation here ^^^ 135: 66 03 05 00 03 00 00 00 if w3 s> 3 goto +5 <LBB0_63> 136: 16 03 09 00 01 00 00 00 if w3 == 1 goto +9 <LBB0_65> 137: 16 03 01 00 02 00 00 00 if w3 == 2 goto +1 <LBB0_66> 138: 05 00 0a 00 00 00 00 00 goto +10 <LBB0_69> 0000000000000458 <LBB0_66>: 139: 69 21 00 00 00 00 00 00 r1 = *(u16 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 140: 05 00 08 00 00 00 00 00 goto +8 <LBB0_69> 0000000000000468 <LBB0_63>: 141: 16 03 06 00 04 00 00 00 if w3 == 4 goto +6 <LBB0_67> 142: 16 03 01 00 08 00 00 00 if w3 == 8 goto +1 <LBB0_68> 143: 05 00 05 00 00 00 00 00 goto +5 <LBB0_69> 0000000000000480 <LBB0_68>: 144: 79 21 00 00 00 00 00 00 r1 = *(u64 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 145: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69> 0000000000000490 <LBB0_65>: 146: 71 21 00 00 00 00 00 00 r1 = *(u8 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 147: 05 00 01 00 00 00 00 00 goto +1 <LBB0_69> 00000000000004a0 <LBB0_67>: 148: 61 21 00 00 00 00 00 00 r1 = *(u32 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 00000000000004a8 <LBB0_69>: 149: 67 01 00 00 20 00 00 00 r1 <<= 32 150: b7 02 00 00 00 00 00 00 r2 = 0 151: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71> 152: c7 01 00 00 20 00 00 00 r1 s>>= 32 153: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72> 00000000000004d0 <LBB0_71>: 154: 77 01 00 00 20 00 00 00 r1 >>= 323 Fixes: ee26dad ("libbpf: Add support for relocatable bitfields") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenz Bauer <lmb@cloudflare.com> Link: https://lore.kernel.org/bpf/20210426192949.416837-4-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
zandrey
pushed a commit
to zandrey/linux-fslc
that referenced
this pull request
May 14, 2021
[ Upstream commit 0f20615 ] Fix BPF_CORE_READ_BITFIELD() macro used for reading CO-RE-relocatable bitfields. Missing breaks in a switch caused 8-byte reads always. This can confuse libbpf because it does strict checks that memory load size corresponds to the original size of the field, which in this case quite often would be wrong. After fixing that, we run into another problem, which quite subtle, so worth documenting here. The issue is in Clang optimization and CO-RE relocation interactions. Without that asm volatile construct (also known as barrier_var()), Clang will re-order BYTE_OFFSET and BYTE_SIZE relocations and will apply BYTE_OFFSET 4 times for each switch case arm. This will result in the same error from libbpf about mismatch of memory load size and original field size. I.e., if we were reading u32, we'd still have *(u8 *), *(u16 *), *(u32 *), and *(u64 *) memory loads, three of which will fail. Using barrier_var() forces Clang to apply BYTE_OFFSET relocation first (and once) to calculate p, after which value of p is used without relocation in each of switch case arms, doing appropiately-sized memory load. Here's the list of relevant relocations and pieces of generated BPF code before and after this patch for test_core_reloc_bitfields_direct selftests. BEFORE ===== Freescale#45: core_reloc: insn Freescale#160 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32 Freescale#46: core_reloc: insn Freescale#167 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 Freescale#47: core_reloc: insn Freescale#174 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 Freescale#48: core_reloc: insn Freescale#178 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 Freescale#49: core_reloc: insn Freescale#182 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 157: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll 159: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1 160: b7 02 00 00 04 00 00 00 r2 = 4 ; BYTE_SIZE relocation here ^^^ 161: 66 02 07 00 03 00 00 00 if w2 s> 3 goto +7 <LBB0_63> 162: 16 02 0d 00 01 00 00 00 if w2 == 1 goto +13 <LBB0_65> 163: 16 02 01 00 02 00 00 00 if w2 == 2 goto +1 <LBB0_66> 164: 05 00 12 00 00 00 00 00 goto +18 <LBB0_69> 0000000000000528 <LBB0_66>: 165: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 167: 69 11 08 00 00 00 00 00 r1 = *(u16 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 168: 05 00 0e 00 00 00 00 00 goto +14 <LBB0_69> 0000000000000548 <LBB0_63>: 169: 16 02 0a 00 04 00 00 00 if w2 == 4 goto +10 <LBB0_67> 170: 16 02 01 00 08 00 00 00 if w2 == 8 goto +1 <LBB0_68> 171: 05 00 0b 00 00 00 00 00 goto +11 <LBB0_69> 0000000000000560 <LBB0_68>: 172: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 174: 79 11 08 00 00 00 00 00 r1 = *(u64 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 175: 05 00 07 00 00 00 00 00 goto +7 <LBB0_69> 0000000000000580 <LBB0_65>: 176: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 178: 71 11 08 00 00 00 00 00 r1 = *(u8 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 179: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69> 00000000000005a0 <LBB0_67>: 180: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 182: 61 11 08 00 00 00 00 00 r1 = *(u32 *)(r1 + 8) ; BYTE_OFFSET relo here w/ RIGHT size ^^^^^^^^^^^^^^^^ 00000000000005b8 <LBB0_69>: 183: 67 01 00 00 20 00 00 00 r1 <<= 32 184: b7 02 00 00 00 00 00 00 r2 = 0 185: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71> 186: c7 01 00 00 20 00 00 00 r1 s>>= 32 187: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72> 00000000000005e0 <LBB0_71>: 188: 77 01 00 00 20 00 00 00 r1 >>= 32 AFTER ===== Freescale#30: core_reloc: insn Freescale#132 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 Freescale#31: core_reloc: insn Freescale#134 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32 129: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll 131: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1 132: b7 01 00 00 08 00 00 00 r1 = 8 ; BYTE_OFFSET relo here ^^^ ; no size check for non-memory dereferencing instructions 133: 0f 12 00 00 00 00 00 00 r2 += r1 134: b7 03 00 00 04 00 00 00 r3 = 4 ; BYTE_SIZE relocation here ^^^ 135: 66 03 05 00 03 00 00 00 if w3 s> 3 goto +5 <LBB0_63> 136: 16 03 09 00 01 00 00 00 if w3 == 1 goto +9 <LBB0_65> 137: 16 03 01 00 02 00 00 00 if w3 == 2 goto +1 <LBB0_66> 138: 05 00 0a 00 00 00 00 00 goto +10 <LBB0_69> 0000000000000458 <LBB0_66>: 139: 69 21 00 00 00 00 00 00 r1 = *(u16 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 140: 05 00 08 00 00 00 00 00 goto +8 <LBB0_69> 0000000000000468 <LBB0_63>: 141: 16 03 06 00 04 00 00 00 if w3 == 4 goto +6 <LBB0_67> 142: 16 03 01 00 08 00 00 00 if w3 == 8 goto +1 <LBB0_68> 143: 05 00 05 00 00 00 00 00 goto +5 <LBB0_69> 0000000000000480 <LBB0_68>: 144: 79 21 00 00 00 00 00 00 r1 = *(u64 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 145: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69> 0000000000000490 <LBB0_65>: 146: 71 21 00 00 00 00 00 00 r1 = *(u8 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 147: 05 00 01 00 00 00 00 00 goto +1 <LBB0_69> 00000000000004a0 <LBB0_67>: 148: 61 21 00 00 00 00 00 00 r1 = *(u32 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 00000000000004a8 <LBB0_69>: 149: 67 01 00 00 20 00 00 00 r1 <<= 32 150: b7 02 00 00 00 00 00 00 r2 = 0 151: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71> 152: c7 01 00 00 20 00 00 00 r1 s>>= 32 153: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72> 00000000000004d0 <LBB0_71>: 154: 77 01 00 00 20 00 00 00 r1 >>= 323 Fixes: ee26dad ("libbpf: Add support for relocatable bitfields") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenz Bauer <lmb@cloudflare.com> Link: https://lore.kernel.org/bpf/20210426192949.416837-4-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Notes to upgrade, following merge conflicts were resolved:
v5.4.80:
arch/arm64/boot/dts/freescale/imx8mn.dtsi
:Fix minor merge conflict where commit [8381af1b684c] in stable tree removed one blank line.
drivers/net/can/flexcan.c
:Fix merge fuzz during integration of stable commit [4c0a778fcf7b5].
Kernel has been built for both aarch64 (
imx_v8_defconfig
) and arm32 (imx_v7_defconfig
).-- andrey