-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memoize find_kmod_module_from_sysfs_node #408
Conversation
This variant of hashmap_get() returns whether the item exists, which allows distinguishing a NULL item from a nonexistent one.
93df428
to
7a91ad9
Compare
Addendum: That was testing on top of the Fedora downstream fork plus the mentioned commit. On main here, #328 (I think) improved generation time to 26 seconds, and then this PR on top brings it down to 17 seconds, almost as fast as before the regression. |
7a91ad9
to
bc68623
Compare
bc68623
to
72956f7
Compare
find_kmod_module_from_sysfs_node() is called for every platform device in the system via find_suppliers(). In turn, this calls kmod_module_new_from_lookup() for every device modalias. This is an expensive call that reads the modalias files every single time from scratch. On many platforms, there are many identical platform devices (e.g. multiple serial ports, or dozens or hundreds of power domain devices). Therefore, it's worth memoizing this so we only perform the expensive lookup once per unique modalias. This cuts down dracut generation time on an Apple M1 Pro MacBook Pro from 26 seconds to 17 seconds, give or take (which is close to the performance prior to 3de4c73, which introduced a major regression which has been incrementally improved in prior commits already).
72956f7
to
4e4b1bf
Compare
I tested this change on a Raspberry Pi Zero 2W. Ubuntu 24.04 (noble) with dracut-install 060+5-1ubuntu3.1 (with linux 6.8.0-1006.6 on 2024-07-01):
With those two commits applied:
So with this hardware and this setup there is no measurable performance improvement. |
This is a win for machines with many duplicate devices. On Apple machines the biggest offender here is the power domains, each of which is one device and there may be hundreds of them. Presumably on the rPi that is not the case. I think you need to do what I did and |
A quick test:
It looks like most time is spend in traversing
|
Syscall count is not a useful proxy for time spent. The point of |
The log output seems to be evenly spread over the timeline. I see access to
|
find_kmod_module_from_sysfs_node() is called for every platform device in the system via find_suppliers(). In turn, this calls kmod_module_new_from_lookup() for every device modalias. This is an expensive call that reads the modalias files every single time from scratch.
On many platforms, there are many identical platform devices (e.g. multiple serial ports, or dozens or hundreds of power domain devices). Therefore, it's worth memoizing this so we only perform the expensive lookup once per unique modalias.
This cuts down dracut generation time on an Apple M1 Pro MacBook Pro from 63 seconds to 24 seconds, give or take, after 80f2caf (in fact, this new code/behavior in dracut-ng was the root cause of the major perf regression that was improved in that commit).
Changes
Memoize find_kmod_module_from_sysfs_node() using a hashmap.
Checklist