-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
system power states #24228
Comments
Where are these ACPI states documented? Looking at the 6.3eB specification section 2.2 I see G0 is Working, G1 is Sleeping. Figure 16-74 in section 16.1 shows four (S1-S4) substates within G1, though S5 is also mentioned. It seems to break down to:
These don't match to what you describe. Which is fine, I just want to see a design that's clearly informed by an existing specification/architecture that we can go to to resolve questions. |
I've interpreted the sleep states for sam0 as "idle" versus "standby" modes documented in the samd21 datasheet. "Idle" might translate to "Runtime idle" and "standby" might translate to "Suspend to idle" |
reference ACPI spec you mentioned and also linux doc |
OK, I see some of it in the Linux documentation. Specifically I see that on ACPI systems these Linux system-wide states might map to ACPI states:
But you're mixing terms from two domains. In Linux Working state is a power management scheme for non-sleep states: the states should be:
The other power management scheme System-wide provides the suspend-to-idle, standby, suspend-to-RAM, and hibernation states. So the Linux architecture splits the working and sleep states into two different schemes. This seems like a good idea; but how does it map to Zephyr? I don't see the terms you provided used in the ACPI reference, and they are not one-to-one with Linux terms. We can say that the terms in this RFC are related to terms used by Linux, but not ACPI. And because they're only related and from different Linux management schemes they are not part of a clearly defined existing power management architecture. Which (again) doesn't mean we can't use them, but it does mean this is nowhere near enough description of a power management architecture to be used as a basis for implementation. Why aren't we starting with the existing architecture, terminology, and PI from Linux, and describing what Zephyr will do in terms of how it's different? |
|
Please change "Working state" in the list of states to "Runtime active state" to be consistent with "Runtime idle state" and the way the terms are used in Linux. This also helps makes clear that there are two general states: runtime, and sleep. The remaining five states are sleep states. It would be nice if all the power-related state names matched their Linux inspiration. |
PM state
PM policy structure and API
|
@wentongwu Could you put that into a draft PR that has the API in a header (with documentation) so we can review it? I have comments, but can't provide them here. |
sure, will do. |
PM core For the device suspend in some power states, it will take two steps. First is the device prepare stage, it will be executed with scheduler locked(k_sched_lock) to allow do some sync if needed with the connected slave/master, the well prepared devices will be linked into dev_prepared_list which will be the foundation of next step, however device can't reject the power state switching in this state because pm policy layer already provides constraint API. However during this, it's possible that there are wake-up interrupts, in that situation a global wake-up count will be defined to record wake-up happened or not, and it will be checked at the beginning of next step(if wake-up happened, the ongoing suspend will be stopped). And for the second step, it will clock/power gate the devices based on dev_prepared_list with irq locked. After that platform suspend will happen with the defined APIs Hold on, during typing this comment, I have another idea, run-time device pm will be always on(maybe no Kconfig option) to take care devices' pm, and it will provide API to indicate device states that can be used to pm policy layer to help decide system next state, because as above state definitions indicate, some of them need devices suspend. And system will do the necessary operations following definitions above and platform implementation with the decided pm state. And that will give more flexible to control device's pm state by device self, but may save less power compared with above method. @pabigot what's your thoughts about this? |
We cannot assume that devices can transition power levels synchronously from within an interrupt. Rather, the allowed system power state must be affected by the states the devices are in. This does suggest that device power management must always be on and devices should automatically transition to the lowest power state consistent with application needs. I think the idea that system power management should control device power management is workable only with a well-defined model of application needs that can constrain system power management. It is not in general acceptable for the system to say "I'm going to sleep, everybody shut down" if the application is waiting for a response from an external device that will be lost if the system sleeps. We don't have such a model. So I still feel we're going too deep without agreeing on and documenting the general design principles and goals. That includes an architectural vision, core concepts, and (abstract) data structures including dependencies and constraints: what they are, how they're represented, and how they affect transitions between system power states. So far the only thing described in any detail is the static power states. For example the concept of an interrupt occurring during a power level transition and so blocking completion of that transition must have been addressed before. How is it handled in the TI and other power management architectures? |
see the pm policy API, it defined constraint API to constrain system power management, the state pm core will follow come from pm policy layer. Ok, I will give more documentation about that.
sure, power transition will not happen in interrupt, I mean the sync between device driver and device firmware will be the first step, device driver will start the sync message which runs in idle thread context, and the response(ack) will be the interrupt self and if receive the interrupt the device will be put to the dev_prepared_list. The second step will do the actual power transition in idle thread context based on the dev_prepared_list. There may be limitations for the sync if the response need a read, so suggest another idle as above.
But we should well consider that device rejects the suggested power state, if happened, we only will go into run-time idle. It has the same effect if devices states can affect the ongoing system pm state.
yes, so we have to discuss, and I will document more. |
We need the API in a (draft?) PR as requested so we can see the whole thing. Please document its behavior as part of the initial PR. If that PR exists please link to it here (and link back here from it). |
In order to reduce the overall system power consumption, we should suspend the devices which are idle or not being used while system is active or running. Currently there is device idle power management framework which intends to do that. But the implementation seems can only do get/put one after one and can't handle the concurrency, for example if multiple threads request for DEVICE_PM_ACTIVE_STATE concurrently, there is possibility polling the signal without reset and signal contention among multiple threads. And the disable function doesn't consider the ongoing transition. Further it doesn't consider the device dependency. So decide rework the implementation and rename it device runtime power management following the definition in zephyrproject-rtos#24228. The API rt_dpm_claim is trying to resume the device and protect any hardware transfer after this call by increasing the usage count. When there is concurrency with another claim or release, this API will pend the current thread to the wait_q until previous transition finished. The API rt_dpm_release is to release previous claim, forbid unexpected release. And no hardware operation depends on release, so release has asynchronous version. After the release, the parents of that device will also be considered automatically. And it can be decided by individual device to support device runtime power management or not by the API rt_dpm_enable/rt_dpm_disable. And also it's the device driver instead of this framework that decide how to define device not in use, it means device driver decide where to put rt_dpm_claim/rt_dpm_release, for example we can put them around transfer function for i2c, but for net device, maybe we can only put them around open/close or other similar place. Signed-off-by: Wentong Wu <wentong.wu@intel.com>
@pabigot @nashif @vanti I attach the API here wentongwu@598eab9 we have so far, please share some comments to avoid going differently with anyone's idea in head. platform pm API and device pm API is still in progress, after that will settle down the implementation of the pm core(or pm manager). |
@wentongwu Can we relax the definition of "Suspend to ram state" state a bit to include states such as the standby mode on TI CC1352, where almost everything is power-gated, with the exception for some minimal CPU logic that is required for waking up from the same point without restarting? I think the word 'everything' is a bit strong here. |
Just capturing some information about existing states. The ones at the top are Zephyr; I think the names are good, and the other references can show what we mean by those states. The draft API pointed to above should incorporate documentation that explain more clearly what's meant by those states (the Linux System State link has the most detail).
System power management states:
Device States
|
@vanti updated. Thanks |
another problem, if all of the code get ready, which one is the best platform we do the test? |
maybe not clock gate, some devices can work on different clocks, maybe the defined API for device pm should pass down them as parameter. |
I think active/suspend/off sound fine to me. On devices that only have active/off state, would suspend map to off?
The infrastructure should be tested on multiple platforms in my opinion, to make sure it is flexible enough. |
I don't like "power gate" and "clock gate" since those are (AIUI) technologies that produce a savings in power, not low-power states. I've also not seen any non-MCU/CPU devices (e.g. I2C or SPI ICs) that document their low power modes in those terms. Remember there are four states to capture. Would it be active/suspend1/suspend2/off? That's getting vague. So I'm leaning towards D0, D1, D2, D3. Then, for consistency, I've gotta change my position on system and go for S0, S0ix, S1, S2, S3, S4, S5. My motivation for the existing Linux-based names was it's more clear what those states mean, but that could be addressed by clear documentation (might even be better, as we can go into detail without getting wrapped up in what's implied by "suspend to RAM". (Though S0ix might become S0i) to indicate "idling in S0" rather than something to do with Intel-specific stuff). Given where we are, I believe a one-to-one correspondence to an existing architecture like ACPI has the best chance for meeting cross-platform needs. I would prefer a different functional-based architecture for the device power states but I don't think that's going anywhere. |
this is now done. |
Currently Zephyr classifies power states into two categories, sleep state and deep sleep state, based on whether the CPU loses execution context during the power state transition, and every state has more sophisticated sub-states(SLEEP_1, SLEEP_2, SLEEP_3, DEEP_SLEEP_1, DEEP_SLEEP_2, DEEP_SLEEP_3) which are classified only by the residency duration. But it's not enough to define power states based on only whether CPU loses execution context, also there is no technical rules to classify sub-states. It's intended to give more flexible for vendors and users to do power management, but it really makes huge noise for long time because of the uncleared and confused definitions.
ACPI specification defines cleared power states which has already been adopted by other OSes(AFAIK linux and windows), so I suggest Zephyr also do power state definition based on ACPI spec. The sleep states that can be supported by Zephyr are listed below.
Working stateRuntime active stateDuring runtime active state, the system is awake and running. In simple terms, the system is in a full running state.
Runtime idle state
Runtime idle is a system sleep state in which all of the cores enter deepest possible idle state and wait for interrupts,
but all the devices are awake and in normal state, no requirements for the devices, leaving them at the states where they are.Suspend to idle state
The system goes through a normal platform suspend where it puts all of the cores in deepest possible idle state and puts peripherals into low-power states(possibly lower-power than available in
the working stateruntime active state). No operating state is lost (the cpu core retains power, does not lose execution context), so the system can go back to where it left off easily enough.Standby state
In addition to putting peripherals into low-power states, which is done for suspend to idle too, all non-boot CPUs are powered off. It should allow more energy to be saved relative to suspend to idle, but the resume latency will generally be greater than for that state. But it should be the same state with suspend to idle state on uniprocesser system.
Suspend to ram state
This state offers significant energy savings by power off as much of the system as possible
as everything in the system is power gated, where memory should be placed into the self-refresh mode to retain its contents. The state of devices and CPUs is saved and held in memory, and it may require some boot-strapping code in ROM to resume the system from it.Suspend to disk state
It gets the greatest power savings through powering off as much of the system as possible, including the memory. The contents of memory are written to disk/flash, and on resume it's read back into memory with the help of boot-strapping code, restores the system to the same point of execution where it went to suspend to disk.
Soft off state
This state consumes a minimal amount of power and requires a large latency in order to return to runtime active state
the Working state. The contents of system(CPU and memory) will not be preserved, so the system will be restarted if woken by any wakeup-source.And the implementation of this RFC will be the start of PM overhaul.
The text was updated successfully, but these errors were encountered: