Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lemp10/darp7/galp5 not powering off after suspend (5.11) #41

Closed
jacobgkau opened this issue Mar 31, 2021 · 13 comments
Closed

lemp10/darp7/galp5 not powering off after suspend (5.11) #41

jacobgkau opened this issue Mar 31, 2021 · 13 comments
Assignees

Comments

@jacobgkau
Copy link
Member

The lemp10/darp7 are sometimes not powering off after shutting down (this happens ~4/5 times, but not every time.) When this happens, journalctl indicates the shutdown process is finishing:

Mar 30 14:31:53 pop-os systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Mar 30 14:31:53 pop-os systemd[1]: Reached target Shutdown.
Mar 30 14:31:53 pop-os systemd[1]: Reached target Final Step.
Mar 30 14:31:53 pop-os systemd[1]: systemd-poweroff.service: Succeeded.
Mar 30 14:31:53 pop-os systemd[1]: Finished Power-Off.
Mar 30 14:31:53 pop-os systemd[1]: Reached target Power-Off.
Mar 30 14:31:53 pop-os systemd[1]: Shutting down.
Mar 30 14:31:53 pop-os systemd-shutdown[1]: Syncing filesystems and block devices.
Mar 30 14:31:53 pop-os systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Mar 30 14:31:53 pop-os systemd-journald[513]: Journal stopped
-- Reboot --
Mar 30 14:41:50 pop-os kernel: Linux version 5.11.0-7612-generic (buildd@lgw01-amd64-052) (gcc (Ubuntu 10.2.0-13ubuntu1) 10.2.0, GNU ld (GNU Binutils for Ubuntu) 2.35.1) #13~1616168001~20.10~cf74746-Ubuntu SMP Mon Mar 29 17:37:00 UTC  (Ubuntu 5.11.0-7612.13~1616168001~20.10~cf74746-generic 5.11.7)
Mar 30 14:41:50 pop-os kernel: Command line: initrd=\EFI\Pop_OS-47ff7a17-79d7-421f-91ee-67f6bd9b0973\initrd.img root=UUID=47ff7a17-79d7-421f-91ee-67f6bd9b0973 ro quiet loglevel=0 systemd.show_status=false splash

The LCD turns off, but the power LED remains solid green, and pushing the power button doesn't turn the machine back on. I'm able to recover from this by holding down the power button until the power LED turns off.

It seems like an EC reset is needed to power on again when this state is reached; I have to wait a second after powering off before it will power on again. I had one case where I had the system plugged in, and it did power off eventually, turning the power LED to orange; but the system wouldn't power on again until I unplugged for a few seconds. If I try to power on without an EC reset, the power LED turns green for a second, then goes back to its previous state.

This seems like it might involve the EC, but I'm starting the issue here because if I downgrade to our previous 5.8 kernel, once I'm booted from that kernel, the system powers off as expected again. Also, the last lemp10 firmware update was nearly two months ago, and I know the issue hasn't been occurring for that long; it just started in the past day or two as we released new kernels (and unfortunately took some time to notice and then narrow down.) This is not happening on the galp5.

@jacobgkau
Copy link
Member Author

jacobgkau commented Apr 2, 2021

I installed TLP on lemp10 to test something unrelated yesterday, and the issue didn't seem to be happening this morning until I removed it again.

The issue still occurs with the latest kernel in the Ubuntu mainline PPA, 5.12-rc5.

The issue does not seem to occur on kernel version 5.10.0 from the mainline PPA.

The issue doesn't occur on 5.10.27, but it does occur on version 5.11.0-rc2 (rc1 didn't build in Ubuntu's PPA.)

@jackpot51 jackpot51 self-assigned this Apr 2, 2021
@jackpot51
Copy link
Member

I see the same thing on my galp5. I'll look into it

@jacobgkau
Copy link
Member Author

jacobgkau commented Apr 2, 2021

I built our kernel with the following six commits reverted (the last six commits to reboot.c, between 5.10 and 5.11):

40247e5#diff-d8ab18c7fe4aecdebc357cc3821e8f92cd7a82669f8e1091013e2ab751fe9c42
1a9d079#diff-d8ab18c7fe4aecdebc357cc3821e8f92cd7a82669f8e1091013e2ab751fe9c42
0c5c017#diff-d8ab18c7fe4aecdebc357cc3821e8f92cd7a82669f8e1091013e2ab751fe9c42
2c622ed#diff-d8ab18c7fe4aecdebc357cc3821e8f92cd7a82669f8e1091013e2ab751fe9c42
f9a9050#diff-d8ab18c7fe4aecdebc357cc3821e8f92cd7a82669f8e1091013e2ab751fe9c42
42b4ca0#diff-d8ab18c7fe4aecdebc357cc3821e8f92cd7a82669f8e1091013e2ab751fe9c42

I got two successful power-offs, but then it hung again (but did finish powering off after no more than a couple of minutes.) I'll step back from this since you're looking at it now.

@jacobgkau jacobgkau changed the title lemp10/darp7 not powering off when running 5.11 lemp10/darp7/galp5 not powering off when running 5.11 Apr 7, 2021
@jacobgkau
Copy link
Member Author

Not fixed, still happens after suspend/resume.

@jacobgkau jacobgkau reopened this Apr 12, 2021
@jacobgkau jacobgkau changed the title lemp10/darp7/galp5 not powering off when running 5.11 lemp10/darp7/galp5 not powering off after suspend (5.11) Apr 12, 2021
@arbitrary-dev
Copy link

Same issue here, applied the latest system/firmware updates for my lemp10 just 15 minutes ago, and now shutdowns doesn't work for me anymore. Just blank screen and the temperature slowly rises at the bottom case.

Hard shutdown with holding the power button helps.

@jackpot51
Copy link
Member

I am close to finding the kernel commit that introduced this

@jackpot51
Copy link
Member

I think this patch might have introduced the hang: https://www.spinics.net/lists/intel-gfx/msg249449.html

@jackpot51
Copy link
Member

I've confirmed reverting e219ef9 will workaround this issue

@arbitrary-dev
Copy link

arbitrary-dev commented Apr 17, 2021

Also, after doing suspend, the system shutdowns correctly for me. Without the suspend, system hangs on shutdown sequence.

@vindard
Copy link

vindard commented Apr 19, 2021

I'm getting this problem as well on my lemp10 with the green LED and keyboard backlight staying lit after shutdown

@arbitrary-dev
Copy link

arbitrary-dev commented Apr 19, 2021

Also, interesting observation, when the lid is closed and lemp10 is hanged on a shutdown sequence -- hard shutdown holding the power button also doesn't work.

@jacobgkau
Copy link
Member Author

Also, interesting observation, when the lid is closed and lemp10 is hanged on a shutdown sequence -- hard shutdown holding the power button also doesn't work.

This is expected, because the power button is disabled while the lid is closed (unless an AC adapter or USB-C charger is connected.)

@arbitrary-dev
Copy link

Updated 5.11.0-7614 kernel fixes the issue for me.

mmstick pushed a commit that referenced this issue Oct 16, 2024
[ Upstream commit eaf3adb ]

When programming phantom pipe, since cursor_width is explicity set to 0,
this causes calculation logic to trigger overflow for an unsigned int
triggering the kernel's UBSAN check as below:

[   40.962845] UBSAN: shift-out-of-bounds in /tmp/amd.EfpumTkO/amd/amdgpu/../display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c:3312:34
[   40.962849] shift exponent 4294967170 is too large for 32-bit type 'unsigned int'
[   40.962852] CPU: 1 PID: 1670 Comm: gnome-shell Tainted: G        W  OE      6.5.0-41-generic #41~22.04.2-Ubuntu
[   40.962854] Hardware name: Gigabyte Technology Co., Ltd. X670E AORUS PRO X/X670E AORUS PRO X, BIOS F21 01/10/2024
[   40.962856] Call Trace:
[   40.962857]  <TASK>
[   40.962860]  dump_stack_lvl+0x48/0x70
[   40.962870]  dump_stack+0x10/0x20
[   40.962872]  __ubsan_handle_shift_out_of_bounds+0x1ac/0x360
[   40.962878]  calculate_cursor_req_attributes.cold+0x1b/0x28 [amdgpu]
[   40.963099]  dml_core_mode_support+0x6b91/0x16bc0 [amdgpu]
[   40.963327]  ? srso_alias_return_thunk+0x5/0x7f
[   40.963331]  ? CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport+0x18b8/0x2790 [amdgpu]
[   40.963534]  ? srso_alias_return_thunk+0x5/0x7f
[   40.963536]  ? dml_core_mode_support+0xb3db/0x16bc0 [amdgpu]
[   40.963730]  dml2_core_calcs_mode_support_ex+0x2c/0x90 [amdgpu]
[   40.963906]  ? srso_alias_return_thunk+0x5/0x7f
[   40.963909]  ? dml2_core_calcs_mode_support_ex+0x2c/0x90 [amdgpu]
[   40.964078]  core_dcn4_mode_support+0x72/0xbf0 [amdgpu]
[   40.964247]  dml2_top_optimization_perform_optimization_phase+0x1d3/0x2a0 [amdgpu]
[   40.964420]  dml2_build_mode_programming+0x23d/0x750 [amdgpu]
[   40.964587]  dml21_validate+0x274/0x770 [amdgpu]
[   40.964761]  ? srso_alias_return_thunk+0x5/0x7f
[   40.964763]  ? resource_append_dpp_pipes_for_plane_composition+0x27c/0x3b0 [amdgpu]
[   40.964942]  dml2_validate+0x504/0x750 [amdgpu]
[   40.965117]  ? dml21_copy+0x95/0xb0 [amdgpu]
[   40.965291]  ? srso_alias_return_thunk+0x5/0x7f
[   40.965295]  dcn401_validate_bandwidth+0x4e/0x70 [amdgpu]
[   40.965491]  update_planes_and_stream_state+0x38d/0x5c0 [amdgpu]
[   40.965672]  update_planes_and_stream_v3+0x52/0x1e0 [amdgpu]
[   40.965845]  ? srso_alias_return_thunk+0x5/0x7f
[   40.965849]  dc_update_planes_and_stream+0x71/0xb0 [amdgpu]

Fix this by adding a guard for checking cursor width before triggering
the size calculation.

Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants