Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIndows] CPU percent is incorrect (perf counters) #2467

Open
Extiward opened this issue Oct 28, 2024 · 7 comments
Open

[WIndows] CPU percent is incorrect (perf counters) #2467

Extiward opened this issue Oct 28, 2024 · 7 comments

Comments

@Extiward
Copy link

Extiward commented Oct 28, 2024

Summary

  • OS: Windows 11 Pro 23H2 22631.4317
  • Architecture: 64bit
  • Psutil version: 6.1.0
  • Python version: 3.11.9
  • Type: core

Description

When using cpu_percent with percpu=False to display CPU load the value is always much lower than expected, e.g. cpu_percent returns load or single digit percent, while CPU actually is loaded to e.g. 50-70% (when looking at Task Manager). When using percpu=True only one element in the array contains large number (the high load element seems to change from run to run), which roughly corresponds to the full CPU utilization (see output example below). CPU has 12 cores and 24 threads.

Code snippet:

while True:
            cpu_load = psutil.cpu_percent(interval=1, percpu=True)

            print(f"CPU load: {cpu_load}%")
            time.sleep(1)

Example output:
CPU load: [0.0, 0.0, 1.6, 3.1, 0.0, 3.1, 0.0, 4.7, 0.0, 0.0, 0.0, 1.6, 0.0, 4.7, 1.6, 0.0, 1.6, 3.1, 3.1, 0.0, 0.0, 3.1, 1.6, 42.4]%
CPU load: [3.1, 3.1, 6.2, 1.6, 0.0, 3.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.6, 0.0, 0.0, 1.6, 0.0, 0.0, 3.1, 1.6, 41.5]%
CPU load: [0.0, 1.6, 6.2, 6.2, 0.0, 0.0, 1.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.1, 0.0, 0.0, 0.0, 0.0, 0.0, 70.1]%
CPU load: [4.6, 0.0, 3.1, 4.7, 0.0, 1.6, 1.6, 1.6, 1.6, 1.6, 4.7, 3.1, 0.0, 3.1, 10.9, 3.1, 0.0, 4.7, 3.1, 10.9, 1.6, 0.0, 3.1, 50.0]%
CPU load: [0.0, 0.0, 0.0, 6.3, 0.0, 0.0, 1.6, 3.1, 0.0, 0.0, 3.1, 0.0, 0.0, 3.1, 3.1, 1.6, 1.6, 3.1, 0.0, 3.1, 0.0, 1.6, 0.0, 35.4]%

image

That can't be correct behavior. Expected result would be to have roughly even load across all cores as seen in the attached screenshot.

@Extiward Extiward added the bug label Oct 28, 2024
@Extiward Extiward changed the title [WIndows 11 23H2] cpu_percent percpu=True shows realisting CPU load only for the last core [WIndows 11 23H2] cpu_percent percpu=True shows realistic CPU load only for the last core Oct 28, 2024
@Extiward Extiward changed the title [WIndows 11 23H2] cpu_percent percpu=True shows realistic CPU load only for the last core [WIndows 11 23H2] cpu_percent percpu=True shows realistic CPU load only for one core Oct 28, 2024
@dbwiddis
Copy link
Contributor

Internally the code uses NtQuerySystemInformation

// gets cpu time information
status = NtQuerySystemInformation(
SystemProcessorPerformanceInformation,

Unfortunately that function's documentation says

[NtQuerySystemInformation may be altered or unavailable in future versions of Windows. Applications should use the alternate functions listed in this topic.]

Of course the alternate function is completely wrong, it is the one that only gives System times:

Use GetSystemTimes instead to retrieve this information.

I've seen other functions changing behavior in Windows 11.

This code should probably be switched to use performance counters ("Processor Information").

@giampaolo
Copy link
Owner

giampaolo commented Nov 20, 2024

When using cpu_percent with percpu=False to display CPU load the value is always much lower than expected, e.g. cpu_percent returns load or single digit percent, while CPU actually is loaded to e.g. 50-70% (when looking at Task Manager). When using percpu=True only one element [...]

According to this description, both cpu_percent(percpu=False) and cpu_percent(percpu=True) return incorrect values (@Extiward am I correct?).

Note: internally cpu_percent(percpu=False) relies on GetSystemTimes. Differently from NtQuerySystemInformation, MS doc does not officially discourage it or deprecate it. It even says:

On a multiprocessor system, the values returned are the sum of the designated times across all processors.

So are we sure GetSystemTimes is at fault here? It's an old and well established Windows API.

For reference, here's the links to psutil implementation

@giampaolo
Copy link
Owner

Related #2384 (comment).

@giampaolo
Copy link
Owner

giampaolo commented Nov 20, 2024

ChatGPT seems to confirm GetSystemTimes is basically deprecated on modern system:

Q: is it true that GetSystemTimes no longer returns accurate results on recent windows versions, and instead I should use performance counters

Yes, this is accurate to an extent. On recent versions of Windows, starting with Windows 8 and Windows Server 2012, the behavior of the GetSystemTimes function changed due to improvements in the way the operating system tracks CPU usage, particularly on modern hardware with dynamic clock speeds (e.g., Turbo Boost, power-saving features).

Modern CPUs adjust their clock speeds dynamically based on workload and power management policies. GetSystemTimes relies on tick-based counters, which can become inconsistent when the clock speed changes.

The precision of the timers used internally by GetSystemTimes may not account for all variations in CPU usage, especially on systems with energy-saving features enabled.

Scaling Issues: On systems with multiple cores or hyper-threading, the reported CPU times may not fully align with actual performance or workload distribution.

It's unfortunate I have to apprehend this from AI instead of MS doc. :-\

If this is true, it may indeed make sense to calculate system CPU times by using perf counters. I remember you Daniel (@dbwiddis) did something similar: you replaced a native Windows API with performance counters for swap_memory() in #2160. Perhaps that suggests perf counters should also be used elsewhere, not only in swap and CPU functions (sigh!).

There seems to be one problem: according to code (e.g. see here and here) some performance counters may be disabled and fail. As such, we should probably ship a dual implementation: try perf counters first else use Windows native API.

And still unsolved, since we're discussing 2 problems here: it's not clear how to replace NtQuerySystemInformation to collect per-CPU metrics.

@giampaolo giampaolo changed the title [WIndows 11 23H2] cpu_percent percpu=True shows realistic CPU load only for one core [WIndows] CPU percent is incorrect (perf counters) Nov 20, 2024
@dbwiddis
Copy link
Contributor

If this is true, it may indeed make sense to calculate system CPU times by using perf counters. I remember you Daniel (@dbwiddis) did something similar: you replaced a native Windows API with performance counters for swap_memory() in #2160. Perhaps that suggests perf counters should also be used elsewhere, not only in swap and CPU functions (sigh!).

Yes, that's generally what I've done over on the Java/JNA side.

There seems to be one problem: according to code (e.g. see here and here) some performance counters may be disabled and fail. As such, we should probably ship a dual implementation: try perf counters first else use Windows native API.

Having navigated through the range of associated problems over the years and implemented multiple fallbacks, yes, "it's complicated". Here are some of the obstacles:

  1. Performance counters can get corrupted, which breaks them. There are MS Docs on fixing them but I've found that an error message pointing the user to the docs for fixing them is the best option here.
  2. Performance counters are tricky with internationalization settings. In particular if you start with a default English configuration, switch it to another language, and then switch it back to English, the English counter name data gets deleted. This is similar to option 1. Print an error message.

In both of the above cases, it may be possible to use a WMI table to fetch the counters from the same source without using the PDH functions. It can be slower (COM overhead) but typically works as a backup.

  1. Performance counters can be manually disabled by users. This is a common hack in the online gaming community where speed matters and players both hyperclock and disable as much background processes as possible.

When they're disabled, you can't do anything, WMI doesn't even work as a backup. Just say so in an error message; however, allow for configuration to minimize log messages in that case. :)

  1. In some containers, special configuration is required to expose the counters to the container. I know this is true for Windows containers, not sure about others. This is (like other container issues) difficult to detect at runtime.

And still unsolved, since we're discussing 2 problems here: it's not clear how to replace NtQuerySystemInformation to collect per-CPU metrics.

That's the "Processor Information" performance counters. Here's the Corresponding WMI Table (it's the 'formatted' one that gives usage metrics you'd expect, the 'raw' data gives "ticks").

Note "Processor Information" is processor-group aware but is Windows 7+. There is a similar "Processor" performance counter that can be used pre-Win7, but it is not processor-group aware.

Also note "Processor Information" can give you "real" tick counts, but then your users will complain that you don't match the Task Manager output, so you'll need a configuration option to choose whether to use the "Utility" counters rather than the "Percent" counters.

Have fun storming the castle!

@giampaolo
Copy link
Owner

That's a lot to chew on. Let's see what I can do. In the meantime... thanks as always. =) The above info is very useful.

@Extiward
Copy link
Author

Extiward commented Nov 25, 2024

When using cpu_percent with percpu=False to display CPU load the value is always much lower than expected, e.g. cpu_percent returns load or single digit percent, while CPU actually is loaded to e.g. 50-70% (when looking at Task Manager). When using percpu=True only one element [...]

According to this description, both cpu_percent(percpu=False) and cpu_percent(percpu=True) return incorrect values (@Extiward am I correct?).

Thank you for taking time to address this issue. Answering your question: Yes, both versions (percpu=True and percpu=False) produce incorrect values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants