-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(actix): Update Healthcheck Actor [INGEST-1481] #1349
Conversation
|
||
// tx = transceiver, mpcs = multi-producer, single consumer | ||
#[derive(Clone, Debug)] | ||
pub struct Addr<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've already been confused about what kinds of T
you could have here: whether it was a Message
or an Actor
. I wonder if we could make this clearer by adding trait bounds like T: MessageType
though that would also involve making a corresponding MessageType
trait (which I guess could be just empty if we don't need any methods?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is in a good shape so might as well take this out of draft
fn default() -> Self { | ||
unimplemented!("register with the SystemRegistry instead") | ||
Aggregator::from_registry() | ||
.send(AcceptsMetrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to IsAuthenticated
and AcceptsMetrics
used to be parallelized. Especially once we have parallelized message handling, that won't be worth the added code complexity, so it's fine to keep this as is.
@@ -12,6 +12,7 @@ | |||
- Refactor profile processing into its own crate. ([#1340](https://github.com/getsentry/relay/pull/1340)) | |||
- Treat "unknown" transaction source as low cardinality for safe SDKs. ([#1352](https://github.com/getsentry/relay/pull/1352), [#1356](https://github.com/getsentry/relay/pull/1356)) | |||
- Conditionally write a default transaction source to the transaction payload. ([#1354](https://github.com/getsentry/relay/pull/1354)) | |||
- Change to the internals of the healthcheck endpoint. ([#1349](https://github.com/getsentry/relay/pull/1349)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really a great message, if you read it in 6 months time you'll just shrug and wonder why anything was written at all.
Maybe something like "Migrated healthcheck actor off actix and onto tokio 1."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a public changelog that's too much information, I'm afraid. There's still the PR linked with a lot of detail if you're curious about what the internal changes were. What's important for the changelog is that if someone experiences issues with healthchecks, they can trace it back to this release.
@@ -13,47 +13,155 @@ use relay_system::{Controller, Shutdown}; | |||
use crate::actors::upstream::{IsAuthenticated, IsNetworkOutage, UpstreamRelay}; | |||
use crate::statsd::RelayGauges; | |||
|
|||
lazy_static::lazy_static! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason you switched to lazy_static instead of once_cell? once_cell is the more modern and maintained variant that doesn't rely on macros and is also (eventually, or so we've been promised) going to be merged into the rust stdlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We widely use lazy_static across the codebase and it's battle-tested, apart from needing less boilerplate.
/// | ||
/// TODO(tobias): The receiver of this message can not yet signal they have completed | ||
/// shutdown. | ||
pub async fn subscribe_v2() -> watch::Receiver<Option<Shutdown>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is a weird custom wrapper around sending a message. Let's leave this for this PR but maybe follow up with a small refactor that simply removes this wrapper and lets other actors directly send the message.
/cc @jan-auer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a strange wrapper, though the SubscribeV2 message can soon go away as we will also make the Controller a non-Actor. You can then simply call a method to get the receiver.
In an offline conversation with @tobias-wilfert, we discussed two potential options:
- Evolve this into a non-async method that returns the receiver
- Make it an async function that resolves exactly once when the signal is sent. The
watch
then becomes a full implementation detail and all other services simply have to await this method.
I'm not entirely sure if the second version would work. We can follow up with that in a separate PR.
This reverts commit 4808e3a.
General
As part of the effort to future-proof Relay, this PR updates the
Healthcheck
Actor to work with standardFutures
instead of the former
futures
crate. It also moves away fromactix
internally.The bulk of the changes are to the healthcheck.rs file with minor
changes to other files to make it work with the reaming system. Some
code that is needed to interface with the current system can be removed
once the remaining system has been updated as well.
Design Choices
loop in healthcheck.rs using
tokio
. Note that this does currentlynot allow for concurrent execution, which was deemed sufficient for
the
Healthcheck
service for the moment.Healthcheck
service does not only need to handle theIsHealthy
messages but also handle a
Shutdown
message. The special thing aboutthat message is that in the current system the
Controller
can justsend out
Shutdown
messages to all actors without needing to knowanything about the internals of the actors. This can be achieved
through a
watch
channel.The
Controller
has the sender and services (subscribiers) haveclones of the receiver. There is a second message handle loop inside
the Actors that receives the Shutdown message and forwards it to the
primary message handle loop. This will be revised in a follow-up PR.
SystemRegistry
remains in place, however for thatto work with the Tokio runtime a copy of the
System
needs to beavailable in each Tokio runtime thread. This is achieved by using the
on_thread_start
,and can be removed once actix is gone.
Healthcheck
service we currently use alazy_static
, a minimal viable Registry of sorts.Future Steps
might be nice to refactor the system by changing:
futures
-->futures01
futures03
-->futures
Healthcheck
service only allows forlimited parallelization. As such, it might be interesting to use
channels internally to protect the internal resource and allow for
greater parallelization.
Upstream
actor. This willrequire us to circle back and remove the hardcoded
bool
in thestruct Message<T>
.