refactor relay threads #71

jeffthibault · 2023-02-17T13:31:59Z

Relay class

Add thread instantiation to connect method
Add is_connected method which checks if the ws.run_forever thread is alive. I think this is more reliable because this thread will be torn down on disconnect.
Various renaming

Relay Manager

Remove thread instantiation from add_relay (handled in relay.connect now)
Add relay_connection_monitor thread, which periodically (connection_monitor_interval_secs) checks if the relays are disconnected and reconnects them if they are.

Note: I felt like reconnection logic belongs in the relay manager as it's job is to manage the relays

jeffthibault · 2023-02-17T13:32:46Z

@callebtc can you give this a review if you get a chance?

jeffthibault · 2023-02-19T21:15:58Z

Also @kdmukai, can you give this a review if you have a chance?

jeremywhelchel · 2023-02-21T14:05:05Z

nostr/relay.py

+                )
+                self._connection_thread.start()
+
+            if not is_reconnect:


Would be more direct here to safe this as self._outgoing_message_thread, and check that directly.

__init__(): ... self._outgoing_message_thread = None def connect(self): ... if not self._outgoing_message_thread: self._outgoing_message_thread = Thread(...)

Then no is_reconnect needed

Good call. Will change this.

Couldn't its instantiation also just be moved into the __post_init__? Doesn't seem like there would ever a case where it would have to be re-created.

It could be I don't think it makes sense to start the outgoing_message_thread until the connection is made.

jeremywhelchel · 2023-02-21T14:06:17Z

Having a separate monitor thread seems much better than triggering reconnect in _on_error(). That approach was hitting a recursion limit for me.

On suggestion:
Instead of a fixed connection_monitor_interval_secs, it would be better to have an exponential backoff strategy. Otherwise this client will hopelessly try to reconnect to a down server every ~5 seconds (default), which is spammy in the logs at best. There are libraries out there to do this...or it should be simple to roll your own. Something like, which caps out at 5 mins indefinitely:
5, 10, 30, 60, 300, 300, 300...
The backoff should be per-relay, though

kdmukai · 2023-02-21T14:28:25Z

nostr/relay.py

-            if self.connected:
-                message = self.queue.get()
+            if self.is_connected():
+                message = self.outgoing_messages.get()


Add comment here that Queue.get() blocks and waits by default. Was confusing to read this without that prior knowledge.

Also obv verify that the blocking still allows this thread to terminate when the main thread exits.

I don't think it's necessary since that is the normal way any Queue would work.

kdmukai · 2023-02-21T14:34:01Z

nostr/relay.py

-                    self.queue.put(message)
-            else:
-                time.sleep(0.1)
+                    self.outgoing_messages.put(message)


Eventually more robust error-handling is probably needed here. If the message is just undeliverable for some reason, it's pointless to return it to the Queue. Maybe for now at least dump the traceback so we're aware of the failure. Something like:

except: import traceback traceback.print_tb()

(I forget which traceback calls do what w/formatting, presenting, etc)

Also can't recall if prints from within threads always make it out to the console.

I think I disagree on the return into the Queue. Relays constantly drop connections and allow reconnect. This ensures that a message in the queue will be delivered when the connection is back up.

Since we keep track of how often an error was encountered (with self.error_counter) we should do something with it. Right now it just stops reconnecting after it reaches self.error_threshold but an exponentially increasing reconnect sleep timer would be more elegant IMO.

kdmukai · 2023-02-21T14:36:46Z

nostr/relay.py

+    def connect(self, is_reconnect=False):
+        if not self.is_connected():
+            with self.lock:
+                self._connection_thread = Thread(


Some DEBUG logging is probably going to be necessary to optionally enable monitoring the connect/reconnect cycles of each Relay. Need max visibility into what's going on when in threading hell.

kdmukai · 2023-02-21T14:40:21Z

Overall, yes, I think the RelayManager should manage the reconnects.

I haven't run this PR yet, but the biggest need here is just a ton of code comments to explain what's going on, overview of who manages what, what each thread is for, when threads block, what cleanup guarantees we have (e.g. is daemon=True infallible?).

callebtc · 2023-02-21T15:29:33Z

nostr/relay.py


+            time.sleep(1)


Why do we sleep here? Shouldn't be necessary with the queue and the connection state anymore.

callebtc · 2023-02-21T15:31:24Z

nostr/relay.py

-            self.connect()
+    def is_connected(self) -> bool:
+        with self.lock:
+            if self._connection_thread is None or not self._connection_thread.is_alive():


Why do you assume that the Websocket is connected because the thread is alive? The connection could've been dropped by the relay. Does the thread necessarily come ton a halt in case of an error?

If that's the case, I prefer the reconnect inside the thread instead of spawning a new one (as it was done before).

No, I don't think the thread is killed on an error. However, I am explicitly closing the connection after error_threshold is reached which does kill the thread.

callebtc · 2023-02-21T15:36:30Z

nostr/relay.py


    def _on_message(self, class_obj, message: str):
        self.message_pool.add_message(message, self.url)

    def _on_error(self, class_obj, error):
-        self.connected = False


I don't understand why this is removed. Errors are thrown when the relay disconnects the client. This is not a nostr error but a WebSocket error. I haven't encountered a case where an error was not a disconnect.

callebtc · 2023-02-21T15:37:01Z

nostr/relay_manager.py

-        ).start()
-
-        time.sleep(1)
+            relay.connect()


This is much better!

callebtc · 2023-02-21T15:37:36Z

nostr/relay_manager.py

+                    if not relay.is_connected():
+                        relay.connect(True)
+
+            time.sleep(self.connection_monitor_interval_secs)


Could be the exponentially increasing sleep counter here.

callebtc · 2023-02-21T15:39:43Z

I think error handling is an issue (see my comment) but apart from that LGTM. I agree with @kdmukai that the way Threads a distributed across files could be confusing but I think this way is better than before where both threads were launched from the relay manager.

It makes more sense that a Relay has its own Queue thread instead of running side by side with it.

Could not test yet!

jeffthibault · 2023-02-26T00:25:20Z

On suggestion: Instead of a fixed connection_monitor_interval_secs, it would be better to have an exponential backoff strategy. Otherwise this client will hopelessly try to reconnect to a down server every ~5 seconds (default), which is spammy in the logs at best. There are libraries out there to do this...or it should be simple to roll your own. Something like, which caps out at 5 mins indefinitely: 5, 10, 30, 60, 300, 300, 300... The backoff should be per-relay, though

@jeremywhelchel I like this idea but if the backoff strategy is per relay, how would the reconnection monitor thread on the relay manager be able to handle that? It seems like we would need a reconnection monitor thread for each relay.

earonesty · 2023-02-27T06:00:14Z

nostr/relay.py

+                        "http_proxy_port": self.proxy_config.port if self.proxy_config is not None else None,
+                        "proxy_type": self.proxy_config.type if self.proxy_config is not None else None
+                    },
+                    name=f"{self.url}-connection"


daemon=True should probably be set

refactor relay threads

d196e7f

jeremywhelchel reviewed Feb 21, 2023

View reviewed changes

kdmukai reviewed Feb 21, 2023

View reviewed changes

callebtc reviewed Feb 21, 2023

View reviewed changes

nostr/relay.py

time.sleep(1)

Copy link

Contributor

callebtc Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we sleep here? Shouldn't be necessary with the queue and the connection state anymore.

callebtc reviewed Feb 21, 2023

View reviewed changes

nostr/relay_manager.py

).start()

time.sleep(1)

relay.connect()

Copy link

Contributor

callebtc Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much better!

jeffthibault reacted with thumbs up emoji

callebtc reviewed Feb 21, 2023

View reviewed changes

callebtc approved these changes Feb 21, 2023

View reviewed changes

earonesty reviewed Feb 27, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor relay threads #71

refactor relay threads #71

jeffthibault commented Feb 17, 2023 •

edited

Loading

jeffthibault commented Feb 17, 2023

jeffthibault commented Feb 19, 2023

jeremywhelchel Feb 21, 2023

jeremywhelchel Feb 21, 2023

jeffthibault Feb 21, 2023

kdmukai Feb 21, 2023

jeffthibault Feb 24, 2023

jeremywhelchel commented Feb 21, 2023

kdmukai Feb 21, 2023

callebtc Feb 21, 2023

kdmukai Feb 21, 2023

callebtc Feb 21, 2023

kdmukai Feb 21, 2023

kdmukai commented Feb 21, 2023

callebtc Feb 21, 2023

callebtc Feb 21, 2023

jeffthibault Feb 26, 2023

callebtc Feb 21, 2023

callebtc Feb 21, 2023

callebtc Feb 21, 2023

callebtc commented Feb 21, 2023

jeffthibault commented Feb 26, 2023

earonesty Feb 27, 2023 •

edited

Loading

refactor relay threads #71

Are you sure you want to change the base?

refactor relay threads #71

Conversation

jeffthibault commented Feb 17, 2023 • edited Loading

jeffthibault commented Feb 17, 2023

jeffthibault commented Feb 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremywhelchel commented Feb 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdmukai commented Feb 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

callebtc commented Feb 21, 2023

jeffthibault commented Feb 26, 2023

earonesty Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

jeffthibault commented Feb 17, 2023 •

edited

Loading

earonesty Feb 27, 2023 •

edited

Loading