-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lemp10/darp7/galp5 not powering off after suspend (5.11) #41
Comments
I installed TLP on lemp10 to test something unrelated yesterday, and the issue didn't seem to be happening this morning until I removed it again. The issue still occurs with the latest kernel in the Ubuntu mainline PPA, 5.12-rc5. The issue does not seem to occur on kernel version 5.10.0 from the mainline PPA. The issue doesn't occur on 5.10.27, but it does occur on version 5.11.0-rc2 (rc1 didn't build in Ubuntu's PPA.) |
I see the same thing on my galp5. I'll look into it |
I built our kernel with the following six commits reverted (the last six commits to 40247e5#diff-d8ab18c7fe4aecdebc357cc3821e8f92cd7a82669f8e1091013e2ab751fe9c42 I got two successful power-offs, but then it hung again (but did finish powering off after no more than a couple of minutes.) I'll step back from this since you're looking at it now. |
Not fixed, still happens after suspend/resume. |
Same issue here, applied the latest system/firmware updates for my lemp10 just 15 minutes ago, and now shutdowns doesn't work for me anymore. Just blank screen and the temperature slowly rises at the bottom case. Hard shutdown with holding the power button helps. |
I am close to finding the kernel commit that introduced this |
I think this patch might have introduced the hang: https://www.spinics.net/lists/intel-gfx/msg249449.html |
I've confirmed reverting e219ef9 will workaround this issue |
Also, after doing suspend, the system shutdowns correctly for me. Without the suspend, system hangs on shutdown sequence. |
I'm getting this problem as well on my lemp10 with the green LED and keyboard backlight staying lit after shutdown |
Also, interesting observation, when the lid is closed and lemp10 is hanged on a shutdown sequence -- hard shutdown holding the power button also doesn't work. |
This is expected, because the power button is disabled while the lid is closed (unless an AC adapter or USB-C charger is connected.) |
Updated 5.11.0-7614 kernel fixes the issue for me. |
[ Upstream commit eaf3adb ] When programming phantom pipe, since cursor_width is explicity set to 0, this causes calculation logic to trigger overflow for an unsigned int triggering the kernel's UBSAN check as below: [ 40.962845] UBSAN: shift-out-of-bounds in /tmp/amd.EfpumTkO/amd/amdgpu/../display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c:3312:34 [ 40.962849] shift exponent 4294967170 is too large for 32-bit type 'unsigned int' [ 40.962852] CPU: 1 PID: 1670 Comm: gnome-shell Tainted: G W OE 6.5.0-41-generic #41~22.04.2-Ubuntu [ 40.962854] Hardware name: Gigabyte Technology Co., Ltd. X670E AORUS PRO X/X670E AORUS PRO X, BIOS F21 01/10/2024 [ 40.962856] Call Trace: [ 40.962857] <TASK> [ 40.962860] dump_stack_lvl+0x48/0x70 [ 40.962870] dump_stack+0x10/0x20 [ 40.962872] __ubsan_handle_shift_out_of_bounds+0x1ac/0x360 [ 40.962878] calculate_cursor_req_attributes.cold+0x1b/0x28 [amdgpu] [ 40.963099] dml_core_mode_support+0x6b91/0x16bc0 [amdgpu] [ 40.963327] ? srso_alias_return_thunk+0x5/0x7f [ 40.963331] ? CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport+0x18b8/0x2790 [amdgpu] [ 40.963534] ? srso_alias_return_thunk+0x5/0x7f [ 40.963536] ? dml_core_mode_support+0xb3db/0x16bc0 [amdgpu] [ 40.963730] dml2_core_calcs_mode_support_ex+0x2c/0x90 [amdgpu] [ 40.963906] ? srso_alias_return_thunk+0x5/0x7f [ 40.963909] ? dml2_core_calcs_mode_support_ex+0x2c/0x90 [amdgpu] [ 40.964078] core_dcn4_mode_support+0x72/0xbf0 [amdgpu] [ 40.964247] dml2_top_optimization_perform_optimization_phase+0x1d3/0x2a0 [amdgpu] [ 40.964420] dml2_build_mode_programming+0x23d/0x750 [amdgpu] [ 40.964587] dml21_validate+0x274/0x770 [amdgpu] [ 40.964761] ? srso_alias_return_thunk+0x5/0x7f [ 40.964763] ? resource_append_dpp_pipes_for_plane_composition+0x27c/0x3b0 [amdgpu] [ 40.964942] dml2_validate+0x504/0x750 [amdgpu] [ 40.965117] ? dml21_copy+0x95/0xb0 [amdgpu] [ 40.965291] ? srso_alias_return_thunk+0x5/0x7f [ 40.965295] dcn401_validate_bandwidth+0x4e/0x70 [amdgpu] [ 40.965491] update_planes_and_stream_state+0x38d/0x5c0 [amdgpu] [ 40.965672] update_planes_and_stream_v3+0x52/0x1e0 [amdgpu] [ 40.965845] ? srso_alias_return_thunk+0x5/0x7f [ 40.965849] dc_update_planes_and_stream+0x71/0xb0 [amdgpu] Fix this by adding a guard for checking cursor width before triggering the size calculation. Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The lemp10/darp7 are sometimes not powering off after shutting down (this happens ~4/5 times, but not every time.) When this happens, journalctl indicates the shutdown process is finishing:
The LCD turns off, but the power LED remains solid green, and pushing the power button doesn't turn the machine back on. I'm able to recover from this by holding down the power button until the power LED turns off.
It seems like an EC reset is needed to power on again when this state is reached; I have to wait a second after powering off before it will power on again. I had one case where I had the system plugged in, and it did power off eventually, turning the power LED to orange; but the system wouldn't power on again until I unplugged for a few seconds. If I try to power on without an EC reset, the power LED turns green for a second, then goes back to its previous state.
This seems like it might involve the EC, but I'm starting the issue here because if I downgrade to our previous 5.8 kernel, once I'm booted from that kernel, the system powers off as expected again. Also, the last lemp10 firmware update was nearly two months ago, and I know the issue hasn't been occurring for that long; it just started in the past day or two as we released new kernels (and unfortunately took some time to notice and then narrow down.) This is not happening on the galp5.
The text was updated successfully, but these errors were encountered: