Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work around flakiness by periodically issuing redundant write commands to fan #35

Open
Krellan opened this issue Apr 12, 2023 · 2 comments

Comments

@Krellan
Copy link
Member

Krellan commented Apr 12, 2023

To avoid flooding the D-Bus interface with traffic, there is an optimization within phosphor-pid-control that avoids issuing a write command to the fan, if the commanded PWM is the same value as it was from earlier. In other words, it tries to avoid writing duplicate PWM that would have no effect.

This is a good optimization, as without it D-Bus can be slowed down so much as to be unusable. However, it needs to be overridden in certain situations, such as when returning to automatic PID control from manual, and thus needing all the fans to receive a new PWM command to be sure they return to what the PID algorithm believes they should be at. This was implemented a while ago: https://gerrit.openbmc.org/c/openbmc/phosphor-pid-control/+/36695

However, it appears this optimization might be too aggressive. It would be worth adding another situation in which it should be overridden. Sometimes, the write path is not 100% reliable, and a fan write command can become lost. When this happens, the fan PWM will not receive the command, so it will remain "stuck" at an old PWM, without changing to the new PWM as it should have been commanded to do.

Various OEM have came up with their own patches to allow for this possibility, to avoid the problem with a fan getting "stuck" at the wrong PWM, by allowing fan writes to happen, regardless if they are redundant or not. Here's an example: https://gerrit.openbmc.org/c/openbmc/phosphor-pid-control/+/37357

I am thinking that a reasonable compromise would be to allow a redundant PWM write only once every X cycles through the main loop, or every Y seconds of elapsed time, whatever would be the most reasonable to implement. This would provide enough opportunities to get the PWM command through, without bogging down the rest of the system by spamming D-Bus needlessly.

@Krellan
Copy link
Member Author

Krellan commented Sep 22, 2023

This is increasingly important, because I've heard of some fan vendors now featuring hardware watchdogs within their fan controllers!

If software fails to send a command every so often, to pet the watchdog, the fan goes into its own version of failsafe mode, a preset hardwired speed. So, we need to do a redundant write at some fixed interval of time, and make this a standard feature of phosphor-pid-control going forward, with a sensible default (I'm open to suggestions).

@Krellan
Copy link
Member Author

Krellan commented Oct 13, 2023

It will also be necessary to audit dbus-sensors and make a similar change there. Reason is, the sensors try to optimize themselves by discarding redundant writes. In this case, however, we intentionally want to write the same value to the hardware, especially if we are trying to pet a watchdog timer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant