Kernel 5.15.5 broke fan control of Dell precision 7560 #96

drNoob13 · 2021-12-03T06:28:09Z

Hello PopOS developers,

Please let me bend your ears with an issue regarding the recent kernel 5.15.5 upgrade.

I have been using Pop!OS 20.04LTS peacefully. Everything had run perfectly until Pop Shop upgraded my system to kernel 5.15.5 a few hours ago. Since then, my laptop fans have been running non-stop even at no load, which had not occurred before. My laptop has been running pretty quietly with the previous kernel 5.13. I'm sensitive to fan noise, so the 5.15.5 kernel is disrupting my work a little bit.

My laptop: Dell Precision 7560, Intel i7-11800H (8 cores), nVidia RTX A4000, 32GB RAM.

I wanted to roll back to kernel 5.13. Unfortunately, I have accidentally hit the apt autoremove which removed the 5.13 and left me with no option for going back to where my system performed as I loved to.

Could you please let me know if there is any plan in place to address this issue?

Furthermore, is it possible to have an option to disable kernel upgrade in Pop Shop?

Thanks for the great OS. I hope my issue gets your attention.

$ uname -a
Linux precision 5.15.5-76051505-generic #202111250933~1638201579~20.04~09f1aa7-Ubuntu SMP Tue Nov 30 02: x86_64 x86_64 x86_64 GNU/Linux

Distribution (run cat /etc/os-release):

$ cat /etc/os-release
NAME="Pop!_OS"
VERSION="20.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
LOGO=distributor-logo-pop-os

The text was updated successfully, but these errors were encountered:

youdontneedtoknow22 · 2021-12-05T18:07:49Z

afaik pop keeps a backup kernel even with autoremove.
Check this: https://support.system76.com/articles/pop-recovery/ and choose old kernel (not recovery)

drNoob13 · 2021-12-05T22:40:39Z

Thanks for the reply. I have rolled back to kernel 5.13.0 by setting it as default in the Systemd boot menu. That solved the issue. However, the bigger problem is that Pop Shop upgraded the kernel as if it is a regular update. For users like me that use PopOS as their daily driver in a professional setting, such kinds of upgrades might break the users' workflow and leave them baffled for a good few hours or days, which is potentially dangerous esp when the users are preparing for their product release.

On a related note, I have been trying to advertise the use of Pop!OS in my company. I feel that the unsolicited upgrades would take many aback and thwart the adoption of Pop!OS in a serious/professional setting.

youdontneedtoknow22 · 2021-12-06T11:34:55Z

I totally understand that. And by seeing all the bugs opened because of the new kernel, I understand why some people say that Pop_OS is for S76 hardware, not for all hardware.

drNoob13 · 2021-12-07T04:29:09Z

One of the reasons I picked my Dell precision laptop was because of the great Linux support Dell has extended for the precision product line. Actually, I was indecisive between the Dell precision workstation and the Oryx Pro 8 as a replacement for my aging Oryx Pro 4. It was a shame that covid-19 and the chip shortage push back System76's laptop production for too long that I could not be able to wait and had to pull the trigger for the Dell.

Back to the issue, I think it would be great if System76 could hold off on upgrading new kernel on the Pop!OS LTS releases a little bit longer. On the experimenting/dot releases such as 20.10, 21.04 or 21.10 it makes sense to apply the latest major upgrades and test the water before adopting them to the next releases. But for those who use Pop!OS in a professional setting, we prefer stability to cutting-edge. It's the reason why we choose LTS instead of dot release in the first place anyway.

[ Upstream commit abaf8d5 ] The ice driver caches next_to_clean value at the beginning of ice_clean_rx_irq() in order to remember the first buffer that has to be freed/recycled after main Rx processing loop. The end boundary is indicated by first descriptor of frame that Rx processing loop has ended its duties. Note that if mentioned loop ended in the middle of gathering multi-buffer frame, next_to_clean would be pointing to the descriptor in the middle of the frame BUT freeing/recycling stage will stop at the first descriptor. This means that next iteration of ice_clean_rx_irq() will miss the (first_desc, next_to_clean - 1) entries. When running various 9K MTU workloads, such splats were observed: [ 540.780716] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 540.787787] #PF: supervisor read access in kernel mode [ 540.793002] #PF: error_code(0x0000) - not-present page [ 540.798218] PGD 0 P4D 0 [ 540.800801] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 540.805231] CPU: 18 PID: 3984 Comm: xskxceiver Tainted: G W 6.3.0-rc7+ #96 [ 540.813619] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 540.824209] RIP: 0010:ice_clean_rx_irq+0x2b6/0xf00 [ice] [ 540.829678] Code: 74 24 10 e9 aa 00 00 00 8b 55 78 41 31 57 10 41 09 c4 4d 85 ff 0f 84 83 00 00 00 49 8b 57 08 41 8b 4f 1c 65 8b 35 1a fa 4b 3f <48> 8b 02 48 c1 e8 3a 39 c6 0f 85 a2 00 00 00 f6 42 08 02 0f 85 98 [ 540.848717] RSP: 0018:ffffc9000f42fc50 EFLAGS: 00010282 [ 540.854029] RAX: 0000000000000004 RBX: 0000000000000002 RCX: 000000000000fffe [ 540.861272] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000ffffffff [ 540.868519] RBP: ffff88984a05ac00 R08: 0000000000000000 R09: dead000000000100 [ 540.875760] R10: ffff88983fffcd00 R11: 000000000010f2b8 R12: 0000000000000004 [ 540.883008] R13: 0000000000000003 R14: 0000000000000800 R15: ffff889847a10040 [ 540.890253] FS: 00007f6ddf7fe640(0000) GS:ffff88afdf800000(0000) knlGS:0000000000000000 [ 540.898465] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 540.904299] CR2: 0000000000000000 CR3: 000000010d3da001 CR4: 00000000007706e0 [ 540.911542] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 540.918789] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 540.926032] PKRU: 55555554 [ 540.928790] Call Trace: [ 540.931276] <TASK> [ 540.933418] ice_napi_poll+0x4ca/0x6d0 [ice] [ 540.937804] ? __pfx_ice_napi_poll+0x10/0x10 [ice] [ 540.942716] napi_busy_loop+0xd7/0x320 [ 540.946537] xsk_recvmsg+0x143/0x170 [ 540.950178] sock_recvmsg+0x99/0xa0 [ 540.953729] __sys_recvfrom+0xa8/0x120 [ 540.957543] ? do_futex+0xbd/0x1d0 [ 540.961008] ? __x64_sys_futex+0x73/0x1d0 [ 540.965083] __x64_sys_recvfrom+0x20/0x30 [ 540.969155] do_syscall_64+0x38/0x90 [ 540.972796] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 540.977934] RIP: 0033:0x7f6de5f27934 To fix this, set cached_ntc to first_desc so that at the end, when freeing/recycling buffers, descriptors from first to ntc are not missed. Fixes: 2fba7dc ("ice: Add support for XDP multi-buffer on Rx side") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230531154457.3216621-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit be3f304 ] We must always register the DRM bridge, since zynqmp_dp_hpd_work_func calls drm_bridge_hpd_notify, which in turn expects hpd_mutex to be initialized. We do this before zynqmp_dpsub_drm_init since that calls drm_bridge_attach. This fixes the following lockdep warning: [ 19.217084] ------------[ cut here ]------------ [ 19.227530] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ 19.227768] WARNING: CPU: 0 PID: 140 at kernel/locking/mutex.c:582 __mutex_lock+0x4bc/0x550 [ 19.241696] Modules linked in: [ 19.244937] CPU: 0 PID: 140 Comm: kworker/0:4 Not tainted 6.6.20+ #96 [ 19.252046] Hardware name: xlnx,zynqmp (DT) [ 19.256421] Workqueue: events zynqmp_dp_hpd_work_func [ 19.261795] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 19.269104] pc : __mutex_lock+0x4bc/0x550 [ 19.273364] lr : __mutex_lock+0x4bc/0x550 [ 19.277592] sp : ffffffc085c5bbe0 [ 19.281066] x29: ffffffc085c5bbe0 x28: 0000000000000000 x27: ffffff88009417f8 [ 19.288624] x26: ffffff8800941788 x25: ffffff8800020008 x24: ffffffc082aa3000 [ 19.296227] x23: ffffffc080d90e3c x22: 0000000000000002 x21: 0000000000000000 [ 19.303744] x20: 0000000000000000 x19: ffffff88002f5210 x18: 0000000000000000 [ 19.311295] x17: 6c707369642e3030 x16: 3030613464662072 x15: 0720072007200720 [ 19.318922] x14: 0000000000000000 x13: 284e4f5f4e524157 x12: 0000000000000001 [ 19.326442] x11: 0001ffc085c5b940 x10: 0001ff88003f388b x9 : 0001ff88003f3888 [ 19.334003] x8 : 0001ff88003f3888 x7 : 0000000000000000 x6 : 0000000000000000 [ 19.341537] x5 : 0000000000000000 x4 : 0000000000001668 x3 : 0000000000000000 [ 19.349054] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff88003f3880 [ 19.356581] Call trace: [ 19.359160] __mutex_lock+0x4bc/0x550 [ 19.363032] mutex_lock_nested+0x24/0x30 [ 19.367187] drm_bridge_hpd_notify+0x2c/0x6c [ 19.371698] zynqmp_dp_hpd_work_func+0x44/0x54 [ 19.376364] process_one_work+0x3ac/0x988 [ 19.380660] worker_thread+0x398/0x694 [ 19.384736] kthread+0x1bc/0x1c0 [ 19.388241] ret_from_fork+0x10/0x20 [ 19.392031] irq event stamp: 183 [ 19.395450] hardirqs last enabled at (183): [<ffffffc0800b9278>] finish_task_switch.isra.0+0xa8/0x2d4 [ 19.405140] hardirqs last disabled at (182): [<ffffffc081ad3754>] __schedule+0x714/0xd04 [ 19.413612] softirqs last enabled at (114): [<ffffffc080133de8>] srcu_invoke_callbacks+0x158/0x23c [ 19.423128] softirqs last disabled at (110): [<ffffffc080133de8>] srcu_invoke_callbacks+0x158/0x23c [ 19.432614] ---[ end trace 0000000000000000 ]--- Fixes: eb2d64b ("drm: xlnx: zynqmp_dpsub: Report HPD through the bridge") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240308204741.3631919-1-sean.anderson@linux.dev (cherry picked from commit 61ba791) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

jackpot51 transferred this issue from pop-os/pop Dec 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel 5.15.5 broke fan control of Dell precision 7560 #96

Kernel 5.15.5 broke fan control of Dell precision 7560 #96

drNoob13 commented Dec 3, 2021

youdontneedtoknow22 commented Dec 5, 2021

drNoob13 commented Dec 5, 2021

youdontneedtoknow22 commented Dec 6, 2021

drNoob13 commented Dec 7, 2021

Kernel 5.15.5 broke fan control of Dell precision 7560 #96

Kernel 5.15.5 broke fan control of Dell precision 7560 #96

Comments

drNoob13 commented Dec 3, 2021

youdontneedtoknow22 commented Dec 5, 2021

drNoob13 commented Dec 5, 2021

youdontneedtoknow22 commented Dec 6, 2021

drNoob13 commented Dec 7, 2021