Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(actix): Update Healthcheck Actor [INGEST-1481] #1349

Merged
merged 19 commits into from
Aug 1, 2022

Conversation

tobias-wilfert
Copy link
Member

@tobias-wilfert tobias-wilfert commented Jul 25, 2022

General

As part of the effort to future-proof Relay, this PR updates the
Healthcheck Actor to work with standard
Futures
instead of the former futures crate. It also moves away from
actix internally.

The bulk of the changes are to the healthcheck.rs file with minor
changes to other files to make it work with the reaming system. Some
code that is needed to interface with the current system can be removed
once the remaining system has been updated as well.

Design Choices

  • Moving away from Actix lead us to introduce our own message handle
    loop in healthcheck.rs using tokio. Note that this does currently
    not allow for concurrent execution, which was deemed sufficient for
    the Healthcheck service for the moment.
  • The Healthcheck service does not only need to handle the IsHealthy
    messages but also handle a Shutdown message. The special thing about
    that message is that in the current system the Controller can just
    send out Shutdown messages to all actors without needing to know
    anything about the internals of the actors. This can be achieved
    through a
    watch channel.
    The Controller has the sender and services (subscribiers) have
    clones of the receiver. There is a second message handle loop inside
    the Actors that receives the Shutdown message and forwards it to the
    primary message handle loop. This will be revised in a follow-up PR.
  • For now, the actix SystemRegistry remains in place, however for that
    to work with the Tokio runtime a copy of the System needs to be
    available in each Tokio runtime thread. This is achieved by using the
    on_thread_start,
    and can be removed once actix is gone.
  • To store the address to the Healthcheck service we currently use a
    lazy_static, a minimal viable Registry of sorts.

Future Steps

  • With the switch from Futures 0.1 to Futures 0.3 on the horizon, it
    might be nice to refactor the system by changing: futures -->
    futures01 futures03 --> futures
  • The current design of the Healthcheck service only allows for
    limited parallelization. As such, it might be interesting to use
    channels internally to protect the internal resource and allow for
    greater parallelization.
  • A good choice for the next upgrade is the Upstream actor. This will
    require us to circle back and remove the hardcoded bool in the
    struct Message<T>.

@tobias-wilfert tobias-wilfert self-assigned this Jul 25, 2022
@flub flub changed the title ref[actix]: Update healthCheck Actor ref(actix): Update healthCheck Actor Jul 25, 2022
@jan-auer jan-auer changed the title ref(actix): Update healthCheck Actor ref(actix): Update Healthcheck Actor Jul 25, 2022
relay-server/src/actors/healthcheck.rs Outdated Show resolved Hide resolved

// tx = transceiver, mpcs = multi-producer, single consumer
#[derive(Clone, Debug)]
pub struct Addr<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've already been confused about what kinds of T you could have here: whether it was a Message or an Actor. I wonder if we could make this clearer by adding trait bounds like T: MessageType though that would also involve making a corresponding MessageType trait (which I guess could be just empty if we don't need any methods?).

Copy link
Contributor

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is in a good shape so might as well take this out of draft

relay-server/src/actors/healthcheck.rs Show resolved Hide resolved
relay-server/src/actors/healthcheck.rs Outdated Show resolved Hide resolved
relay-server/src/actors/healthcheck.rs Show resolved Hide resolved
relay-server/src/actors/healthcheck.rs Show resolved Hide resolved
relay-server/src/endpoints/healthcheck.rs Outdated Show resolved Hide resolved
relay-system/src/controller.rs Outdated Show resolved Hide resolved
relay-system/src/controller.rs Outdated Show resolved Hide resolved
relay-system/src/controller.rs Outdated Show resolved Hide resolved
relay-system/src/controller.rs Outdated Show resolved Hide resolved
relay-system/src/controller.rs Outdated Show resolved Hide resolved
@tobias-wilfert tobias-wilfert marked this pull request as ready for review July 29, 2022 06:29
@tobias-wilfert tobias-wilfert requested a review from a team July 29, 2022 06:29
fn default() -> Self {
unimplemented!("register with the SystemRegistry instead")
Aggregator::from_registry()
.send(AcceptsMetrics)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to IsAuthenticated and AcceptsMetrics used to be parallelized. Especially once we have parallelized message handling, that won't be worth the added code complexity, so it's fine to keep this as is.

relay-server/src/endpoints/healthcheck.rs Outdated Show resolved Hide resolved
relay-server/src/service.rs Show resolved Hide resolved
relay-system/Cargo.toml Outdated Show resolved Hide resolved
relay-system/src/controller.rs Outdated Show resolved Hide resolved
relay-system/src/controller.rs Outdated Show resolved Hide resolved
@@ -12,6 +12,7 @@
- Refactor profile processing into its own crate. ([#1340](https://github.com/getsentry/relay/pull/1340))
- Treat "unknown" transaction source as low cardinality for safe SDKs. ([#1352](https://github.com/getsentry/relay/pull/1352), [#1356](https://github.com/getsentry/relay/pull/1356))
- Conditionally write a default transaction source to the transaction payload. ([#1354](https://github.com/getsentry/relay/pull/1354))
- Change to the internals of the healthcheck endpoint. ([#1349](https://github.com/getsentry/relay/pull/1349))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really a great message, if you read it in 6 months time you'll just shrug and wonder why anything was written at all.

Maybe something like "Migrated healthcheck actor off actix and onto tokio 1."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a public changelog that's too much information, I'm afraid. There's still the PR linked with a lot of detail if you're curious about what the internal changes were. What's important for the changelog is that if someone experiences issues with healthchecks, they can trace it back to this release.

@@ -13,47 +13,155 @@ use relay_system::{Controller, Shutdown};
use crate::actors::upstream::{IsAuthenticated, IsNetworkOutage, UpstreamRelay};
use crate::statsd::RelayGauges;

lazy_static::lazy_static! {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason you switched to lazy_static instead of once_cell? once_cell is the more modern and maintained variant that doesn't rely on macros and is also (eventually, or so we've been promised) going to be merged into the rust stdlib.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We widely use lazy_static across the codebase and it's battle-tested, apart from needing less boilerplate.

relay-server/src/actors/healthcheck.rs Show resolved Hide resolved
///
/// TODO(tobias): The receiver of this message can not yet signal they have completed
/// shutdown.
pub async fn subscribe_v2() -> watch::Receiver<Option<Shutdown>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is a weird custom wrapper around sending a message. Let's leave this for this PR but maybe follow up with a small refactor that simply removes this wrapper and lets other actors directly send the message.
/cc @jan-auer

Copy link
Member

@jan-auer jan-auer Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a strange wrapper, though the SubscribeV2 message can soon go away as we will also make the Controller a non-Actor. You can then simply call a method to get the receiver.

In an offline conversation with @tobias-wilfert, we discussed two potential options:

  • Evolve this into a non-async method that returns the receiver
  • Make it an async function that resolves exactly once when the signal is sent. The watch then becomes a full implementation detail and all other services simply have to await this method.

I'm not entirely sure if the second version would work. We can follow up with that in a separate PR.

relay-system/src/controller.rs Outdated Show resolved Hide resolved
@tobias-wilfert tobias-wilfert changed the title ref(actix): Update Healthcheck Actor ref(actix): Update Healthcheck Actor [INGEST-1481] Aug 1, 2022
@tobias-wilfert tobias-wilfert merged commit 4808e3a into master Aug 1, 2022
@tobias-wilfert tobias-wilfert deleted the tobias-wilfert/future-proofing-relay branch August 1, 2022 09:09
tobias-wilfert added a commit that referenced this pull request Aug 1, 2022
tobias-wilfert added a commit that referenced this pull request Aug 1, 2022
@tobias-wilfert tobias-wilfert restored the tobias-wilfert/future-proofing-relay branch August 2, 2022 07:45
@tobias-wilfert tobias-wilfert mentioned this pull request Aug 18, 2022
30 tasks
@HazAT HazAT added this to the Upgrade Tokio in Relay milestone Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants