Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.x: speed up node startup by avoid loading all modules during boot #10989

Merged
merged 1 commit into from
Apr 15, 2024

Conversation

the-mikedavis
Copy link
Member

A fairly large chunk of boot time is spent trying to look up modules that have certain attributes via Module:module_info(attributes). Executing the builtin module_info/1 function is very very fast but only after the module is initially loaded. For any unloaded module, attempting to execute the function loads the module. Code loading can be fairly slow with some modules taking around a millisecond individually, and all code loading is currently done in serial by the code server.

We use this for rabbit_boot_step and rabbit_feature_flag attributes for example and we can't avoid scanning many modules without much larger breaking changes. When we read those attributes though we only lookup modules from applications that depend on the rabbit app. This saves quite a lot of work because we avoid most dependencies and builtin modules from Erlang/OTP that we would never load anyways, for example the wx modules.

We can re-use that function in the management plugin to avoid scanning most available modules for the rabbit_mgmt_extension behaviour. We also need to stop the prometheus dependency from scanning for its interceptor and collector behaviours on boot. We can do this by setting explicit empty values for the application environment variables prometheus uses as defaults. This is a safe change because we don't use interceptors and we register all collectors explicitly.

There is a functional change to the management plugin to be aware of: any plugins that use the rabbit_mgmt_extension behaviour must declare a dependency on the rabbit application. This is true for all tier-1 plugins but should be kept in mind for community plugins.

For me locally this reduces single node boot (bazel run broker) time from ~6100ms to ~4300ms.

@the-mikedavis the-mikedavis self-assigned this Apr 12, 2024
@the-mikedavis the-mikedavis marked this pull request as draft April 12, 2024 16:51
A fairly large chunk of boot time is spent trying to look up modules
that have certain attributes via `Module:module_info(attributes)`.
Executing the builtin `module_info/1` function is very very fast but
only after the module is initially loaded. For any unloaded module,
attempting to execute the function loads the module. Code loading can
be fairly slow with some modules taking around a millisecond
individually, and all code loading is currently done in serial by the
code server.

We use this for `rabbit_boot_step` and `rabbit_feature_flag` attributes
for example and we can't avoid scanning many modules without much larger
breaking changes. When we read those attributes though we only lookup
modules from applications that depend on the `rabbit` app. This saves
quite a lot of work because we avoid most dependencies and builtin
modules from Erlang/OTP that we would never load anyways, for example
the `wx` modules.

We can re-use that function in the management plugin to avoid scanning
most available modules for the `rabbit_mgmt_extension` behaviour. We
also need to stop the `prometheus` dependency from scanning for its
interceptor and collector behaviours on boot. We can do this by setting
explicit empty/default values for the application environment variables
`prometheus` uses as defaults. This is a safe change because we don't
use interceptors and we register all collectors explicitly.

**There is a functional change to the management plugin to be aware
of**: any plugins that use the `rabbit_mgmt_extension` behaviour must
declare a dependency on the `rabbit` application. This is true for all
tier-1 plugins but should be kept in mind for community plugins.

For me locally this reduces single node boot (`bazel run broker`) time
from ~6100ms to ~4300ms.
@the-mikedavis the-mikedavis force-pushed the md/boot-avoid-module-scan branch from 85e2a81 to 692b6f6 Compare April 12, 2024 17:12
@the-mikedavis the-mikedavis marked this pull request as ready for review April 12, 2024 18:46
@michaelklishin
Copy link
Member

My basic tests on a 10 core aarch64 machine with fast SSDs suggest that this offers a ≈ 26% speedup with six plugins enabled (management, Prometheus, federation with management, shovel with management).

@michaelklishin michaelklishin added this to the 4.0.0 milestone Apr 15, 2024
@michaelklishin
Copy link
Member

We have decided to keep this 4.0-specific. Note that most of the benefits are arguably for integration test suites where many nodes are started across all suites.

@michaelklishin michaelklishin merged commit 22e7f02 into main Apr 15, 2024
23 checks passed
@michaelklishin michaelklishin deleted the md/boot-avoid-module-scan branch April 15, 2024 16:15
@michaelklishin michaelklishin changed the title Avoid loading all modules during boot 4.x: speed up node startup by avoid loading all modules during boot Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants