-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Make inflight background metrics more efficient. #7597
Make inflight background metrics more efficient. #7597
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand this correctly, the code:
- Creates a set which contains the background procs since the last metrics updates since the last scrape
- As background processes are started (either for the first time or after a while of being dormant), they're added to this set
- When a scrape is performed (with
collect()
), metrics are pulled from each process before the set is emptied again - Another metric is created which tracks the amount of background processes that are currently running at any given time.
Is that accurate? I have a few clarifying questions below as well.
|
||
self._proc = _BackgroundProcess(name, self) | ||
|
||
def start(self, rusage: "Optional[resource._RUsage]"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does this run on every batch, or just when the process stops (runs out of things to process) and then starts up again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming is a bit confusing here. Basically LoggingContext.{__enter__,__exit__}
are called first time we enter the context and last time we leave the context, while {start,stop}
get called whenever the logging context starts/stops being the active/current logging context. (The names resume
and pause
might have been better here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, and "entering the context" occurs each time a background processes starts processing.
So _background_processes_active_since_last_scrape
contains just those processes that have become active since the last scrape, versus those that are continuously active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a quick comment to where _background_process_in_flight_count
is instantiated mentioning this distinction please?
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Have you considered writing the metrics straight into a Gauge on each stop() call, rather than having to iterate them when we get a collect() call? |
I'm a little nervous of the performance impact on repeatedly calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to do what it says on the tin, and is definitely a step in the right direction.
|
||
self._proc = _BackgroundProcess(name, self) | ||
|
||
def start(self, rusage: "Optional[resource._RUsage]"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a quick comment to where _background_process_in_flight_count
is instantiated mentioning this distinction please?
…rg-hotfixes Synapse 1.15.0rc1 (2020-06-09) ============================== Features -------- - Advertise support for Client-Server API r0.6.0 and remove related unstable feature flags. ([\#6585](#6585)) - Add an option to disable autojoining rooms for guest accounts. ([\#6637](#6637)) - For SAML authentication, add the ability to pass email addresses to be added to new users' accounts via SAML attributes. Contributed by Christopher Cooper. ([\#7385](#7385)) - Add admin APIs to allow server admins to manage users' devices. Contributed by @dklimpel. ([\#7481](#7481)) - Add support for generating thumbnails for WebP images. Previously, users would see an empty box instead of preview image. ([\#7586](#7586)) - Support the standardized `m.login.sso` user-interactive authentication flow. ([\#7630](#7630)) Bugfixes -------- - Allow new users to be registered via the admin API even if the monthly active user limit has been reached. Contributed by @dkimpel. ([\#7263](#7263)) - Fix email notifications not being enabled for new users when created via the Admin API. ([\#7267](#7267)) - Fix str placeholders in an instance of `PrepareDatabaseException`. Introduced in Synapse v1.8.0. ([\#7575](#7575)) - Fix a bug in automatic user creation during first time login with `m.login.jwt`. Regression in v1.6.0. Contributed by @olof. ([\#7585](#7585)) - Fix a bug causing the cross-signing keys to be ignored when resyncing a device list. ([\#7594](#7594)) - Fix metrics failing when there is a large number of active background processes. ([\#7597](#7597)) - Fix bug where returning rooms for a group would fail if it included a room that the server was not in. ([\#7599](#7599)) - Fix duplicate key violation when persisting read markers. ([\#7607](#7607)) - Prevent an entire iteration of the device list resync loop from failing if one server responds with a malformed result. ([\#7609](#7609)) - Fix exceptions when fetching events from a remote host fails. ([\#7622](#7622)) - Make `synctl restart` start synapse if it wasn't running. ([\#7624](#7624)) - Pass device information through to the login endpoint when using the login fallback. ([\#7629](#7629)) - Advertise the `m.login.token` login flow when OpenID Connect is enabled. ([\#7631](#7631)) - Fix bug in account data replication stream. ([\#7656](#7656)) Improved Documentation ---------------------- - Update the OpenBSD installation instructions. ([\#7587](#7587)) - Advertise Python 3.8 support in `setup.py`. ([\#7602](#7602)) - Add a link to `#synapse:matrix.org` in the troubleshooting section of the README. ([\#7603](#7603)) - Clarifications to the admin api documentation. ([\#7647](#7647)) Internal Changes ---------------- - Convert the identity handler to async/await. ([\#7561](#7561)) - Improve query performance for fetching state from a PostgreSQL database. ([\#7567](#7567)) - Speed up processing of federation stream RDATA rows. ([\#7584](#7584)) - Add comment to systemd example to show postgresql dependency. ([\#7591](#7591)) - Refactor `Ratelimiter` to limit the amount of expensive config value accesses. ([\#7595](#7595)) - Convert groups handlers to async/await. ([\#7600](#7600)) - Clean up exception handling in `SAML2ResponseResource`. ([\#7614](#7614)) - Check that all asynchronous tasks succeed and general cleanup of `MonthlyActiveUsersTestCase` and `TestMauLimit`. ([\#7619](#7619)) - Convert `get_user_id_by_threepid` to async/await. ([\#7620](#7620)) - Switch to upstream `dh-virtualenv` rather than our fork for Debian package builds. ([\#7621](#7621)) - Update CI scripts to check the number in the newsfile fragment. ([\#7623](#7623)) - Check if the localpart of a Matrix ID is reserved for guest users earlier in the registration flow, as well as when responding to requests to `/register/available`. ([\#7625](#7625)) - Minor cleanups to OpenID Connect integration. ([\#7628](#7628)) - Attempt to fix flaky test: `PhoneHomeStatsTestCase.test_performance_100`. ([\#7634](#7634)) - Fix typos of `m.olm.curve25519-aes-sha2` and `m.megolm.v1.aes-sha2` in comments, test files. ([\#7637](#7637)) - Convert user directory, state deltas, and stats handlers to async/await. ([\#7640](#7640)) - Remove some unused constants. ([\#7644](#7644)) - Fix type information on `assert_*_is_admin` methods. ([\#7645](#7645)) - Convert registration handler to async/await. ([\#7649](#7649))
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Hopefully fixes #7596.