ref(actix): Migrate the RelayCache actor #1485

jan-auer · 2022-09-20T13:58:14Z

Updates the Relay cache to run as Tokio service. The upstream query is moved to a background task with its own mpsc. All other operations, such as updating the internal map and retrieving relay information, run sequentially in the service.

The service operates exactly as before, including an pre-existing race condition when a relay is queried while a fetch is running.

#skip-changelog

jan-auer · 2022-09-20T16:40:05Z

relay-server/src/actors/relays.rs

            relays: HashMap::new(),
-            relay_channels: HashMap::new(),
+            senders: HashMap::new(),
+            fetch_channel: mpsc::channel(1),


We need to spawn a long(er) running background task to fetch relay infos and insert them into our map. It cannot hold mutable access to the map during that time, since there can be get requests in between.

My solution is to have a dedicated mpsc to update the internal state when the fetch result is ready. Since there can only be a single fetch at any given time, this channel can have a bounded capacity of 1.

An alternative solution to this would be an optional future similar to SleepHandle. I have an experimental implementation of that here, although I don't think it is any better.

jan-auer · 2022-09-20T16:54:09Z

relay-server/src/actors/relays.rs

+pub struct RelayCacheService {
+    static_relays: HashMap<RelayId, RelayInfo>,
+    relays: HashMap<RelayId, RelayState>,
+    senders: HashMap<RelayId, Vec<Sender<GetRelayResult>>>,


This is a substantial change. Instead of spawning a long-running future for every request call, we simply queue up the incoming message's Sender. If the relay info is already cached, this is skipped, of course.

This requires us to allocate for each incoming (pending) request, although it should still cause less overhead than maintaining a tokio task each. While queued, these senders do not consume CPU.

jan-auer · 2022-09-20T16:55:51Z

relay-server/src/actors/relays.rs

-    }
-}
-
-impl Handler<GetRelays> for RelayCache {


This has been moved to the call site in the public_keys endpoint handler, allowing us to avoid spawning yet another task over here.

jan-auer · 2022-09-20T18:14:10Z

relay-server/src/utils/sleep_handle.rs

+        if poll.is_ready() {
+            self.reset();
        }


At some point during the implementation, I forgot to reset the sleep handle after polling it successfully. That sent the service into an endless loop. That is a dangerous API. Usually, the expectation can be that once the sleep has been polled successfully, it is handled and can be reset.

please update the doc comment to explain that the reset happens automatically

flub · 2022-09-21T11:51:08Z

relay-server/src/actors/relays.rs

+                    Some(result) = self.fetch_channel.1.recv() => self.handle_fetch_result(result),
+                    () = &mut self.delay => self.fetch_relays(),
+                    Some(message) = rx.recv() => self.get_or_fetch(message.0, message.1),
+                    else => break,


should this break log a sentry error? relay is now very broken, some stuff will not happen and we'll need to debug why. it'd be nice to have a sentry error for this rather than having to look for that info message that says this thing has stopped.

This break is in fact dead code and cannot be reached because self.delay will always be pending. That we have to change in a follow-up for all services at some point in the future when we revise the shutdown strategy.

flub · 2022-09-21T11:56:00Z

relay-server/src/utils/sleep_handle.rs

+        if poll.is_ready() {
+            self.reset();
        }


please update the doc comment to explain that the reset happens automatically

The `ProjectCache` and `RelayCache` actors both follow a common pattern: They maintain a map of cached objects that they populate asynchronously. There is a message in their public interface to retrieve data from this cache. Since the `RelayCache` was ported in #1485, it keeps a list of senders for each waiting caller of that message. Once the cache entry has been resolved, it loops through all senders and passed a clone of the value on. This has the downside that it doesn't only require resources on the requesting side, but also in the cache actor itself. A better approach is to break this into a two-step process: 1. The service creates a shared channel that will be populated once 2. Recipients attach to the shared channel and wait for the response The recipient therefore first has to await the receive end of the shared channel, and then await the value from that channel. Of course, if the value is already in the cache, this can be skipped, and the value can be sent directly to the recipient. # Implementation Details This PR implements this with two nested `oneshot` channels. The outer channel resolves an enum, which is either the result value, or a shared inner channel that resolves the value. We implement this as service response behavior, so that the building blocks for this (senders, channels, ...) can be reused between multiple services. This can be used for some of the messages, while other messages have regular async behavior. The response future type is opaque, so users of such a service do not notice the two-step behavior; they simply resolve the final value. This PR updates the `RelayCacheService` to use this new response behavior. Internally, the service still needs to hold two maps: - A map for the actual raw data that it pulls responses from. This data can be different or more elaborate than what's exposed via public messages. - A map for open channels. Entries in this map can be removed as soon as the cache value is resolved, because after that the service can short-circuit and respond with the value directly.

ref(actix): Migrate the RelayCache actor

97b78dd

jan-auer self-assigned this Sep 20, 2022

jan-auer added 3 commits September 20, 2022 17:45

fix(actix): Restore behavior when relays are missing in resp

2247635

ref(actix): Simplifications

f02b6cf

fix(actix): Reset the delay when fetching is complete

4881fd3

jan-auer commented Sep 20, 2022

View reviewed changes

jan-auer added 2 commits September 20, 2022 18:44

fix: Schedule another fetch run when done

fb1ad96

ref: Reorder to match old code better

732ae55

jan-auer commented Sep 20, 2022

View reviewed changes

jan-auer added 3 commits September 20, 2022 18:57

ref: Prioritize fetch results over new messages

52f8134

fix: Use compat to send legacy message

8ba7106

fix: Correctly reset backoff and delay

d03458a

jan-auer mentioned this pull request Sep 20, 2022

Future Proof Relay #1424

Closed

30 tasks

jan-auer commented Sep 20, 2022

View reviewed changes

jan-auer marked this pull request as ready for review September 20, 2022 18:15

jan-auer requested a review from a team September 20, 2022 18:15

ref: Doc comments and select reordering

9bf64b4

jan-auer force-pushed the ref/actix-relays-2 branch from bd8fd3c to 9bf64b4 Compare September 21, 2022 10:18

jan-auer assigned flub Sep 21, 2022

flub unassigned jan-auer Sep 21, 2022

flub approved these changes Sep 21, 2022

View reviewed changes

flub assigned jan-auer and unassigned flub Sep 21, 2022

ref: Improve documentation

f8d954d

jan-auer merged commit 0b3eb1b into master Sep 21, 2022

jan-auer deleted the ref/actix-relays-2 branch September 21, 2022 16:08

HazAT added this to the Upgrade Tokio in Relay milestone Nov 21, 2022

jan-auer mentioned this pull request Nov 22, 2022

feat(system): Add broadcast behavior for debounced services #1619

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref(actix): Migrate the RelayCache actor #1485

ref(actix): Migrate the RelayCache actor #1485

jan-auer commented Sep 20, 2022 •

edited

Loading

jan-auer Sep 20, 2022 •

edited

Loading

jan-auer Sep 20, 2022 •

edited

Loading

jan-auer Sep 20, 2022 •

edited

Loading

jan-auer Sep 20, 2022

flub Sep 21, 2022

flub Sep 21, 2022

jan-auer Sep 21, 2022

flub Sep 21, 2022

ref(actix): Migrate the RelayCache actor #1485

ref(actix): Migrate the RelayCache actor #1485

Conversation

jan-auer commented Sep 20, 2022 • edited Loading

jan-auer Sep 20, 2022 • edited Loading

Choose a reason for hiding this comment

jan-auer Sep 20, 2022 • edited Loading

Choose a reason for hiding this comment

jan-auer Sep 20, 2022 • edited Loading

Choose a reason for hiding this comment

jan-auer Sep 20, 2022

Choose a reason for hiding this comment

flub Sep 21, 2022

Choose a reason for hiding this comment

flub Sep 21, 2022

Choose a reason for hiding this comment

jan-auer Sep 21, 2022

Choose a reason for hiding this comment

flub Sep 21, 2022

Choose a reason for hiding this comment

jan-auer commented Sep 20, 2022 •

edited

Loading

jan-auer Sep 20, 2022 •

edited

Loading

jan-auer Sep 20, 2022 •

edited

Loading

jan-auer Sep 20, 2022 •

edited

Loading