The sensor value override feature is obscure and needs visibility #32

Krellan · 2024-05-16T22:46:42Z

Currently, the dbus-sensors sensor daemons support the idea of sensor value override, that is, the ability to receive a written value from a D-Bus incoming command, instead of reading the actual value from the underlying sensor hardware. This feature is also known as sensor mocking.

This feature is very useful for debugging, but it is obscure. It has the following limitations:

Enabling or disabling sensor mocking is done at compile time. At runtime, there is no way to see if the sensor mocking feature is available or not, short of trying it to see if you get an error message or not.
The override is placed into effect when the first D-Bus write command is received, and then, it will remain in effect until the sensor daemon itself is forcefully restarted. There is no way to gracefully cancel the sensor override feature from being in effect.
There is no way, at runtime, to see if the current sensor value was placed there by use of sensor mocking, or by use of actually reading from the underlying hardware. This would be good to have, for troubleshooting.

So, to summarize, let's make an interface with three bools, and add it to each sensor.

Is the sensor mocking (sensor value override) feature allowed to be enabled at all for this sensor, or not? This decision was made at compile time. This is read-only at runtime.
Is the sensor mocking feature currently in effect? If allowed, it could be turned on or off, by writing to this boolean.
Is the current value of this sensor something that came from mocking, instead of being a value that came from the underlying sensor hardware? This would cover cases in which the mocking feature was enabled but the user has not written a value yet, or in which the mocking feature was disabled but the sensor hardware has not provided us with a new reading yet.

This is closely related to the Mutability feature we talked about a while ago. The difference is that of layering: sensors that are mutable are designed to have their values typically externally changeable by end users, such as fan speed settings in manual mode, and such. The sensor mocking feature is more of a development and test feature, however, and typically will not be allowed during production.

So, any thoughts on this?

The text was updated successfully, but these errors were encountered:

amboar · 2024-05-16T23:22:36Z

Three bools gives 8 states. Are they all valid?

I expect it would be better to use a "choice" style string-type property that behaves as an enumeration (and given its OpenBMC-related, needs to be defined in PDI and use the sdbusplus enum support).

edtanous · 2024-05-16T23:25:35Z

The override is placed into effect when the first D-Bus write command is received, and then, it will remain in effect until the sensor daemon itself is forcefully restarted. There is no way to gracefully cancel the sensor override feature from being in effect.

This was an intentional design decision, the expectation is that when completed running mocks/testing, the user by policy should reboot the bmc when the test completes. Doing anything else is a half-measure that very likely could lead to things not being reset properly, and us debugging code that isn't useful to a product. With that said, if you wanted to add support for something like this and document that it's best effort, feel free.

Enabling or disabling sensor mocking is done at compile time. At runtime, there is no way to see if the sensor mocking feature is available or not, short of trying it to see if you get an error message or not.

The compile time flag is to allow opting out of the feature for platforms that don't want it, or might consider it a potential DOS issue if an attacker were to get access to dbus. Why is returning an error message that a feature is not supported a problem in practice?

So, to summarize, let's make an interface with three bools, and add it to each sensor.

A dbus interface somewhat of defeats the purpose of having a mock, because now other services can code to "is this being mocked" and the behavior of the code could be different when in test mode versus not. I could certainly see the argument that we should log these things to the journal when they're mocked and when they're not, but a formal interface implies that we expect software to use it, which by design, we don't want to have happen.

and typically will not be allowed during production.

What does this mean? There are absolutely test harnesses that run this test during a production build. In fact, that's a lot of the reason for having it, being able to troubleshoot software on a production signed build.

For the logging portion, I think good old journal logs would get us where we need to be in short order, logging when sensors enter and exit mocking mode.

Krellan · 2024-05-22T06:21:29Z

Three bools gives 8 states. Are they all valid?

I expect it would be better to use a "choice" style string-type property that behaves as an enumeration (and given its OpenBMC-related, needs to be defined in PDI and use the sdbusplus enum support).

Good point. There are less than 8 valid states, so an enum might be preferable. Here's the states that would be valid:

Sensor mocking is currently not active, and is not allowed, by compile-time decision
Sensor mocking is currently not active (but is allowed to become active), and the last value came from real hardware
Sensor mocking is currently not active, and the last value came from sensor mocking (the real hardware has not yet provided a real value to replace the mocked value)
Sensor mocking is currently active, and the last value is a mocked value
Sensor mocking is currently active, and the last value came from real hardware (in other words, mocking was just recently enabled, and the user has not written a mocked value yet)

I still think using bools would be clearer than having to come up with strings to describe each of these.

Another reason to use bools, instead of an enum, is to make it impossible to have to deal with the case of an invalid string, and also, to make it self-documenting what choices are available (instead of forcing the user to guess, or read the source, to learn which strings are valid).

Krellan · 2024-05-22T06:31:10Z

This was an intentional design decision, the expectation is that when completed running mocks/testing, the user by policy should reboot the bmc when the test completes. Doing anything else is a half-measure that very likely could lead to things not being reset properly, and us debugging code that isn't useful to a product. With that said, if you wanted to add support for something like this and document that it's best effort, feel free.

Interesting. Although inconvenient, it does reduce the attack surface for bugs.

The compile time flag is to allow opting out of the feature for platforms that don't want it, or might consider it a potential DOS issue if an attacker were to get access to dbus. Why is returning an error message that a feature is not supported a problem in practice?

Because it's a one-way thing, as you mentioned above, no way to undo it without restarting the sensor services (or rebooting the BMC). If I wanted to probe for the existence of sensor mocking, there's no way to do this safely, as if I try to write a value, even the same value that the sensor is already set to, it will either generate the error message or activate the sensor mocking feature, even if I don't want to activate it (I just wanted to see if it was enabled or not).

A dbus interface somewhat of defeats the purpose of having a mock, because now other services can code to "is this being mocked" and the behavior of the code could be different when in test mode versus not. I could certainly see the argument that we should log these things to the journal when they're mocked and when they're not, but a formal interface implies that we expect software to use it, which by design, we don't want to have happen.

What would be the use case of a "stealth" mocking feature that would be enabled but not having any way for the user to know if it is enabled or not?

As for the interface, we could do the same compromise that was done for phosphor-pid-control informative diagnostic information that is now appearing in the D-Bus objects (and is a great help when trying to tune a running system): use the Debug namespace for the name of the interface. This avoids having to formalize and standardize it.

What does this mean? There are absolutely test harnesses that run this test during a production build. In fact, that's a lot of the reason for having it, being able to troubleshoot software on a production signed build.

Agreed, and one of the tests I wanted to add in our product was a test to properly verify that the sensor mocking feature was disabled and inactive during a production build. Given the above, I can't do this test without inadvertently activating the sensor mocking feature (which is a test failure, but causes me to have to reboot the BMC, because of this). So, this test has to be classified as a disruptive test instead of a passive test, which, among other things, means I can't run it in parallel with other tests (in our particular test framework).

For the logging portion, I think good old journal logs would get us where we need to be in short order, logging when sensors enter and exit mocking mode.

This would be good to add, as a low-hanging fruit. As for entering mocking mode, this would take place upon the first D-Bus write received (unless the sensor is a special one with its own write handlers, such as the fan PWM sensors, which are designed to push sensor writes down to the underlying hardware, to set the fan speed during normal running conditions). As for leaving mocking mode, there is currently no way to do this without a restart, as you mentioned, which already has fairly loud logging.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The sensor value override feature is obscure and needs visibility #32

The sensor value override feature is obscure and needs visibility #32

Krellan commented May 16, 2024

amboar commented May 16, 2024

edtanous commented May 16, 2024

Krellan commented May 22, 2024

Krellan commented May 22, 2024

The sensor value override feature is obscure and needs visibility #32

The sensor value override feature is obscure and needs visibility #32

Comments

Krellan commented May 16, 2024

amboar commented May 16, 2024

edtanous commented May 16, 2024

Krellan commented May 22, 2024

Krellan commented May 22, 2024