Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't resume from suspend if a specific disk is added to DISK_DEVICES with AHCI_RUNTIME_PM_ON_BAT enabled #606

Closed
luckv opened this issue Dec 9, 2021 · 20 comments

Comments

@luckv
Copy link

luckv commented Dec 9, 2021

[x] I've read and accepted the Bug Reporting Howto
[x] I've provided all required tlp-stat outputs via Gist

Describe the bug
If the parameter DISK_DEVICES is specifically set with one of my disks (sdb) and AHCI_RUNTIME_PM_ON_BAT is set to auto, after suspend (systemctl suspend), the pc turn on but not completely. The screen turn on but remains blank (lit but black), and the wifi doesn't connect (I can't ping the pc after suspension).

I don't know much about the disk. Was shipped with the laptop and report this name: Micron 1100 MTFD

I excluded all other possible causes:

  • The issue doesn't happen without tlp, or with tlp disabled
  • The issue doesn't happen with tlp default configuration
  • The issue doesn't happen if that disk is excluded from DISK_DEVICES or AHCI_RUNTIME_PM_ON_BAT is set to on
  • The issue doesn't happen with all other settings enabled that you see in my tlp-stat. I enabled them one by one and rebooted to check where the problem was.
  • Gone through the troubleshooting guide, disabled both RUNTIME_PM_ON_BAT and USB_AUTOSUSPEND, but doesn't work
  • Searched in the web for similar issues
  • Searched in this repository for similar issues

Expected behavior
The pc to return from suspension normally

To Reproduce
Steps to reproduce the unexpected behavior:

  • Set sdb in the list of devices
  1. The problem occurs only in battery mode
  2. Full output of tlp-stat: link. All customizations are in luckv.conf

Additional context
OS: Fedora 35 (Workstation Edition)
Kernel version (uname -r): 5.15.6-200.fc35.x86_64

I have also Pop!_OS version 21.04 (same kernel version, 5.15) installed on the same pc (also the same boot disk) but with TLP 1.3.1. The issue doesn't present with it.

@linrunner
Copy link
Owner

linrunner commented Dec 9, 2021

This is probably related to #587 (comment)

The AHCI autosuspend sysfiles associated to sdb are:

/sys/block/sdb/power/control = on, autosuspend_delay_ms = -1
/sys/bus/pci/devices/0000:00:17.0/ata2/power/control     = auto  -- sdb

The disk seems to refuse to accept auto to /sys/block/sdb/power/control.

Does the problem disappear after writing on to /sys/bus/pci/devices/0000:00:17.0/ata2/power/control?

@luckv
Copy link
Author

luckv commented Dec 9, 2021

@linrunner are you suggesting that should I try to write directly in /sys/bus/pci/devices/0000:00:17.0/ata2/power/control?
How can I do that? echo 'abc' >> /sys/bus/pci/devices/0000:00:17.0/ata2/power/control ?

@linrunner
Copy link
Owner

linrunner commented Dec 9, 2021

Almost. You need a root shell:

echo on > /sys/bus/pci/devices/0000:00:17.0/ata2/power/control

@luckv
Copy link
Author

luckv commented Dec 9, 2021

The situation has changed. Before I tried what you suggest I commented AHCI_RUNTIME_PM_ON_BAT in my configuration so its value was auto.
Without doing a reboot I done sudo tlp start, then I checked tlp-stat and what I saw was:

/sys/block/sdb/power/control = auto, autosuspend_delay_ms = 15000
/sys/bus/pci/devices/0000:00:17.0/ata2/power/control     = auto  -- sdb

but the problem persists, also after reboot. After reboot I took tlp-stat. See here

I wait for you response because this is so unexpected, I don't think doing what you suggest now it's the correct thing to do

@linrunner
Copy link
Owner

You tested the opposite of what I need to isolate the cause. So again properly please:

echo on > /sys/bus/pci/devices/0000:00:17.0/ata2/power/control
echo on > /sys/block/sdb/power/control

Then suspend and check if the problem occurs again.

@luckv
Copy link
Author

luckv commented Dec 12, 2021

Sorry @linrunner for answering only now.
I do what you suggested and it worked! My laptop resume from suspension correctly.
In tlp-stat the output remained the same as my last message (see here).
But if I try to read those values with cat I obtain those results:

[root@fedora luckv]# cat /sys/block/sdb/power/control
on
[root@fedora luckv]# cat /sys/bus/pci/devices/0000:00:17.0/ata2/power/control
auto

Hope this can help you to resolve the problem

@linrunner
Copy link
Owner

In tlp-stat the output remained the same as my last message (see here).
But if I try to read those values with cat I obtain those results:

tlp-stat always displays the same values as cat, it even uses cat internally.

The deviation in your test case is due to the fact that tlp is called again after the resume and restores the state before your manual write operations. I guess the next suspend will produce the problem again.

What I don't understand is why /sys/block/sdb/power/control stays on after boot and resume but is auto after tlp start.

Let's test this a little more thoroughly:

  1. Enable trace mode by adding the following line to your config file
TLP_DEBUG="disk pm ps run"
  1. Reboot and show the output of
sudo tlp-stat -d
  1. Make your manual changes (in a root shell)
echo on > /sys/bus/pci/devices/0000:00:17.0/ata2/power/control
echo on > /sys/block/sdb/power/control

and show the output of

sudo tlp-stat -d

again.

  1. Suspend and resume your laptop. Show the output of
sudo tlp-stat -d

once again. Finallly the trace output for the whole business

sudo tlp-stat -T

@luckv
Copy link
Author

luckv commented Dec 12, 2021

The deviation in your test case is due to the fact that tlp is called again after the resume and restores the state before your manual write operations. I guess the next suspend will produce the problem again.

You are right, only the first time the pc suspends and resumes correctly

@luckv
Copy link
Author

luckv commented Dec 12, 2021

@linrunner I do what as you written.
I created a Gist with all the output you requested here.
The number at the start of the file name should be the number in the numbered list of the passages. I also included a tlp-stat -c taken at the end to recheck my configuration.

@linrunner
Copy link
Owner

Your outputs show strange deviations.

  1. The result of writing manually

echo on > /sys/bus/pci/devices/0000:00:17.0/ata2/power/control
echo on > /sys/block/sdb/power/control

is

/sys/block/sdb/power/control = auto, autosuspend_delay_ms = 15000
/sys/bus/pci/devices/0000:00:17.0/ata2/power/control = on -- sdb

  1. In your start post the initial state is

/sys/block/sdb/power/control = on, autosuspend_delay_ms = -1
/sys/bus/pci/devices/0000:00:17.0/ata2/power/control = auto -- sdb

On the other hand, the result of the recent boot process is

/sys/block/sdb/power/control = auto, autosuspend_delay_ms = 15000
/sys/bus/pci/devices/0000:00:17.0/ata2/power/control = auto -- sdb

Is this still the same kernel version?

As a workaround I suggest you turn off AHCI runtime pm completely by configuring

AHCI_RUNTIME_PM_ON_AC= 
AHCI_RUNTIME_PM_ON_BAT=
  1. Show your effective configuration (difference to TLP's defaults only)

    tlp-stat -s --cdiff

  2. Reboot and show

    sudo tlp-stat -s -d

  3. Suspend/resume and show

    sudo tlp-stat -s -d

  4. Another suspend/resume and show

    sudo tlp-stat -s -d

@luckv
Copy link
Author

luckv commented Dec 13, 2021

Is this still the same kernel version?

Absolutely yes. I show you the output of my uname --all:

Linux xxx-fedora 5.15.6-200.fc35.x86_64 #1 SMP Wed Dec 1 13:41:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

In your start post the initial state is

/sys/block/sdb/power/control = on, autosuspend_delay_ms = -1
/sys/bus/pci/devices/0000:00:17.0/ata2/power/control = auto -- sdb

I wasn't able to obtain that output ever again. I don't know why onestly.

Your outputs show strange deviations.

I know. In fact one time I don't try what you asked because I was confused of the results of my actions.
One very very strange thing it's that, even if tlp uses cat internally as you said, tlp-stat -d and cat /sys/block/sdb/power/control show 2 different results, one said that /sys/block/sdb/power/control is on, the second say that is auto... even if executed one after the other.
This behaviour it's not deterministic... I don't know really what to think.

@luckv
Copy link
Author

luckv commented Dec 13, 2021

I follow the passages. Here the output.

The output after reboot and after suspends, remained the same, with this output for both disks:

/sys/block/sdb/power/control = on, autosuspend_delay_ms = -1
/sys/bus/pci/devices/0000:00:17.0/ata2/power/control     = on  -- sdb

@linrunner I started thinking this is a problem of the disk.. may be?

@luckv
Copy link
Author

luckv commented Dec 13, 2021

While in battery mode I restored AHCI_RUNTIME_PM_ON_BAT=on, shutdown the pc, then booted on.
I opened a terminal and launched only this commands. The outputs for /sys/block/... differs from tlp-stat

Gist here

linrunner added a commit that referenced this issue Dec 14, 2021
@linrunner
Copy link
Owner

Oops. In fact there is an error in the output. TLP internally reads and writes the correct path /sys/block/DISK/device/power/control, but tlp-stat -d displays it incorrectly as /sys/block/DISK/power/control.

I can't provide a Fedora package right away, so please edit the file /usr/share/tlp/func.d/tlp-func-stat Line 446 by hand according to the commit diff please.

@linrunner
Copy link
Owner

Workaround added to the FAQ.

@luckv
Copy link
Author

luckv commented Dec 14, 2021

I edited my files. Do you want that I test something else for you @linrunner ?

@luckv
Copy link
Author

luckv commented Dec 14, 2021

May be really a bug in the linux kernel??

@linrunner linrunner added this to the 1.5 Release milestone Dec 15, 2021
@linrunner
Copy link
Owner

Just for the record: with AHCI_RUNTIME_PM_ON_BAT=on suspend/resume works without freeze now?

May be really a bug in the linux kernel??

Of course. But we will not solve it here, you will have to create a kernel bug report.

I edited my files. Do you want that I test something else for you @linrunner ?

I will let you know when the 1.5 beta is ready.

@luckv
Copy link
Author

luckv commented Dec 17, 2021

Just for the record: with AHCI_RUNTIME_PM_ON_BAT=on suspend/resume works without freeze now?

Yes

Of course. But we will not solve it here, you will have to create a kernel bug report.

Ok

I will let you know when the 1.5 beta is ready.

Perfect

@linrunner
Copy link
Owner

@luckv : Fedora F35 packages are available now. Enjoy.

@linrunner linrunner removed this from the 1.5 Release milestone Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants