You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been observing that psuserver is crashing and re-starting for our system when the chassis first powers on, and I've narrowed it down to the service trying to open() the sensors' sysfs files when they're already open.
Here's how the issue manifests, as I understand it:
psusensor service first initializes, PSUSensorMain.cpp calls createSensors() which calls createSensorsCallback(), which will create a sensor object for the sensor map that first open()s the associated sysfs file as part of its constructor
Then, when chassis_state_manager.cpp receives signal that power ON is complete (in Chassis::sysStateChange()), it sets xyz.openbmc_project.State.ChassisCurrentPowerStatus to ChassisOn, which then sends a org.freedesktop.DBus.Properties.PropertiesChanged signal to the chassisMatch callback function in dbus-sensors Utils.cpp.
This will open() a file that's already open, which will cause PSUSensor to terminate itself. It can then recover and run normally, because the chassis is now on
I was wondering if this had been observed by anyone else, or if it's isolated to our configuration. The only changes we have to entity-manager are adding per-sensor power states to schemas/legacy.json, but even without these changes the issue is still observed. One method we had been using to prevent this was to only call sensor->activate() if sensor->isActive() returns false, but I didn't know if there was a more underlying problem at hand.
The text was updated successfully, but these errors were encountered:
I encountered this problem in hwmontempsensor as well:
If the power off signal is sent after the power on timer callback and before open() in createsensors(), the timer.cancel() will fail, and the sysfs file will be closed and then opened.
If the chassis off signal is sent after chassis on callback createsensors(), the sysfs file will be opened again, leading to a crash.
I've been observing that psuserver is crashing and re-starting for our system when the chassis first powers on, and I've narrowed it down to the service trying to
open()
the sensors' sysfs files when they're already open.Here's how the issue manifests, as I understand it:
createSensors()
which callscreateSensorsCallback()
, which will create a sensor object for the sensor map that firstopen()
s the associated sysfs file as part of its constructorChassis::sysStateChange()
), it setsxyz.openbmc_project.State.ChassisCurrentPowerStatus
toChassisOn
, which then sends aorg.freedesktop.DBus.Properties.PropertiesChanged
signal to thechassisMatch
callback function in dbus-sensors Utils.cpp.ChassisCurrentPowerStatus
equalsChassisOn
(Which I think it always would, if this signal is tied to changing theChassisCurrentPowerStatus
property), and if so, it sets theon
variable to true (https://github.com/openbmc/dbus-sensors/blob/master/src/Utils.cpp#L526).powerStateChanged()
function in PSUSensorMain.cpp (which was assigned tohostStatusCallback
in https://github.com/openbmc/dbus-sensors/blob/master/src/PSUSensorMain.cpp#L1221), with thenewState
argument passed as true (based on theon
variable). BecausenewState
is true,createSensors()
is called withactivateOnly
set to true; and because we have a corresponding sensor object for each sensor in the map (and not a nullptr), it will callsensor->activate
(without checking to see if it alreadyisActive()
). (https://github.com/openbmc/dbus-sensors/blob/master/src/PSUSensorMain.cpp#L928)open()
a file that's already open, which will cause PSUSensor to terminate itself. It can then recover and run normally, because the chassis is now onI was wondering if this had been observed by anyone else, or if it's isolated to our configuration. The only changes we have to entity-manager are adding per-sensor power states to schemas/legacy.json, but even without these changes the issue is still observed. One method we had been using to prevent this was to only call
sensor->activate()
ifsensor->isActive()
returns false, but I didn't know if there was a more underlying problem at hand.The text was updated successfully, but these errors were encountered: