-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iowait consuming too much cpu resources #30
Comments
It seems to be related to sdbusplus. the problem occurs whenever sdbusplus is used. PS: The OpenBMC commit I am using is 6fddef299932b1270a799e78566e25daa911f742 So, I opened a new issue at openbmc/sdbusplus#92 |
What platform was this tested on? |
meta-g220a And I Used a new commit , The issue still there: |
First off, this file shouldn't be in the meta layer. Issues with this file would've been caught earlier by CI if it had been put in the right place. I see a large number of very "expensive" to read sensors. Considering this is an ast2500, it seems very likely that the io load you're seeing is real, and a result of too much IO being done on that platform with that configuration. I also see a number of config stanzas that are just unsupported by upstream (like pmem). How certain are you that you tested this on an upstream build? To triage, I would start by removing the various config types, until you find the one that's causing the most contention, then look at what you can do to increase the performance of those sensor types. It's very likely that you just need to optimize your platforms read rates to account for the bandwidth of your i2c lanes, especially for pmbus devices, which are non-trivial to read. Note, that a high iowait percentage is not a bug in itself. It was likely that in the past this platform was just blocking in userspace, and sensors were scanning slower than specified in the config file. When we moved to uring, now that same contention shows up as iowait instead of silently happening in userspace. This doesn't mean that the actual sensor scan rates are any worse than it was before. In fact, they're likely better because of uring, but do make this problem more aparent. Good luck with your debug. Let us know what your findings are, and if we can transfer this bug to be g220 specific. |
iowait will drop when revert this kernel commit "io_uring: Use io_schedule* in cqring " |
The linked patch is a change to accounting more than anything else. I don't think it's particularly concerning? https://lore.kernel.org/lkml/538065ee-4130-6a00-dcc8-f69fbc7d7ba0@kernel.dk/ |
After BMC normal startup, check the CPU usage:
then stop all sensor service , used the follow command:
systemctl stop xyz.openbmc_project.hwmontempsensor.service
systemctl stop xyz.openbmc_project.fansensor.service
systemctl stop xyz.openbmc_project........service
......
Check the CPU usage again:
Even if I just started one sensor hwmon service(xyz.openbmc_project.hwmontempsensor.service), and without any sensor, this issue still here
The following are the situations before and after stopping the hwmon service
The text was updated successfully, but these errors were encountered: