Skip to content

Commit 49f6c56

Browse files
rafaeljwgregkh
authored andcommitted
cpuidle: governors: menu: Avoid selecting states with too much latency
commit 779b1a1 upstream. Occasionally, the exit latency of the idle state selected by the menu governor may exceed the PM QoS CPU wakeup latency limit. Namely, if the scheduler tick has been stopped already and predicted_ns is greater than the tick period length, the governor may return an idle state whose exit latency exceeds latency_req because that decision is made before checking the current idle state's exit latency. For instance, say that there are 3 idle states, 0, 1, and 2. For idle states 0 and 1, the exit latency is equal to the target residency and the values are 0 and 5 us, respectively. State 2 is deeper and has the exit latency and target residency of 200 us and 2 ms (which is greater than the tick period length), respectively. Say that predicted_ns is equal to TICK_NSEC and the PM QoS latency limit is 20 us. After the first two iterations of the main loop in menu_select(), idx becomes 1 and in the third iteration of it the target residency of the current state (state 2) is greater than predicted_ns. State 2 is not a polling one and predicted_ns is not less than TICK_NSEC, so the check on whether or not the tick has been stopped is done. Say that the tick has been stopped already and there are no imminent timers (that is, delta_tick is greater than the target residency of state 2). In that case, idx becomes 2 and it is returned immediately, but the exit latency of state 2 exceeds the latency limit. Address this issue by modifying the code to compare the exit latency of the current idle state (idle state i) with the latency limit before comparing its target residency with predicted_ns, which allows one more exit_latency_ns check that becomes redundant to be dropped. However, after the above change, latency_req cannot take the predicted_ns value any more, which takes place after commit 38f8309 ("cpuidle: menu: Remove iowait influence"), because it may cause a polling state to be returned prematurely. In the context of the previous example say that predicted_ns is 3000 and the PM QoS latency limit is still 20 us. Additionally, say that idle state 0 is a polling one. Moving the exit_latency_ns check before the target_residency_ns one causes the loop to terminate in the second iteration, before the target_residency_ns check, so idle state 0 will be returned even though previously state 1 would be returned if there were no imminent timers. For this reason, remove the assignment of the predicted_ns value to latency_req from the code. Fixes: 5ef499c ("cpuidle: menu: Handle stopped tick more aggressively") Cc: 4.17+ <stable@vger.kernel.org> # 4.17+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/5043159.31r3eYUQgx@rafael.j.wysocki Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1 parent e58d219 commit 49f6c56

File tree

1 file changed

+12
-17
lines changed
  • drivers/cpuidle/governors

1 file changed

+12
-17
lines changed

drivers/cpuidle/governors/menu.c

Lines changed: 12 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -287,20 +287,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
287287
return 0;
288288
}
289289

290-
if (tick_nohz_tick_stopped()) {
291-
/*
292-
* If the tick is already stopped, the cost of possible short
293-
* idle duration misprediction is much higher, because the CPU
294-
* may be stuck in a shallow idle state for a long time as a
295-
* result of it. In that case say we might mispredict and use
296-
* the known time till the closest timer event for the idle
297-
* state selection.
298-
*/
299-
if (predicted_ns < TICK_NSEC)
300-
predicted_ns = data->next_timer_ns;
301-
} else if (latency_req > predicted_ns) {
302-
latency_req = predicted_ns;
303-
}
290+
/*
291+
* If the tick is already stopped, the cost of possible short idle
292+
* duration misprediction is much higher, because the CPU may be stuck
293+
* in a shallow idle state for a long time as a result of it. In that
294+
* case, say we might mispredict and use the known time till the closest
295+
* timer event for the idle state selection.
296+
*/
297+
if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC)
298+
predicted_ns = data->next_timer_ns;
304299

305300
/*
306301
* Find the idle state with the lowest power while satisfying
@@ -316,13 +311,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
316311
if (idx == -1)
317312
idx = i; /* first enabled state */
318313

314+
if (s->exit_latency_ns > latency_req)
315+
break;
316+
319317
if (s->target_residency_ns > predicted_ns) {
320318
/*
321319
* Use a physical idle state, not busy polling, unless
322320
* a timer is going to trigger soon enough.
323321
*/
324322
if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
325-
s->exit_latency_ns <= latency_req &&
326323
s->target_residency_ns <= data->next_timer_ns) {
327324
predicted_ns = s->target_residency_ns;
328325
idx = i;
@@ -354,8 +351,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
354351

355352
return idx;
356353
}
357-
if (s->exit_latency_ns > latency_req)
358-
break;
359354

360355
idx = i;
361356
}

0 commit comments

Comments
 (0)