Create base supervisor script #26

nghi01 · 2024-01-24T17:45:20Z

Currently still need to implement recovering nodes, need to talk with Alex about implementing recovering nodes from the available public keys and private keys.

zimmermatt · 2024-01-24T20:26:00Z

devops/deploy/supervisor.py

+async def ping_node(host, port):
+    try:
+        reader, writer = await asyncio.open_connection(host, port)
+        print(f"Alive node at {host}:{port}")


Please use logging

zimmermatt · 2024-01-24T21:07:51Z

devops/deploy/supervisor.py

+        writer.close()
+        await writer.wait_closed()


These should maybe be in a finally block.

Does the reader need to be closed?

According to the asyncio python doc doesn't seem so. I'd look further into it.

Clarification:
Both the reader and finally block were not included in the example from the python docs so I did not include them

Yeah... the docs don't provide clear guidance on cleanup in the face of exceptions. Would be nice if they did.

Cannot find any instance where they try to close the reader, so I assume it is not worth doing so.

zimmermatt · 2024-01-24T21:11:55Z

devops/deploy/supervisor.py

+async def ping_node(host, port):
+    try:
+        reader, writer = await asyncio.open_connection(host, port)
+        print(f"Alive node at {host}:{port}")


This may be good enough for our MVP, but merely being able to connect on the port at the host is not a guarantee that it's our application listening at that moment in time.

A stronger signal would be to have a "healthcheck" ping.

A little more on this beyond the above and our conversation.

In a deployment environment for something we want to ensure stays up, we'd usually use (in increasing preference and not an exhaustive list) one of:

a monitor script with cron

a supervising tool such as daemontools or supervisord

systemd

Determining "liveness" of a service via a network call is somewhat fragile since the network path from the monitoring app could have a partition, but other apps can reach it just fine. The more degenerate case is that the monitoring app can reach it, but the apps that depend on it can't. A more refined approach is to look at client app calls for signs of health (assuming a reasonable degree of traffic).

zimmermatt · 2024-01-24T21:12:21Z

devops/deploy/supervisor.py

+        writer.close()
+        await writer.wait_closed()
+    except (OSError, asyncio.TimeoutError):
+        print(f"Node at {host}:{port} is not responding")


zimmermatt · 2024-01-24T21:14:53Z

devops/deploy/supervisor.py

+    while True:
+        for i in range(len(peer_list)):
+            await ping_node("127.0.0.1", peer_list[i][1])
+        await asyncio.sleep(5)


Consider making this an input parameter to the supervisor script that's passed through to here.

U mean the 127.0.0.1 as an input parameter?

The sleep interval.

devops/deploy/supervisor.py

nghi01 · 2024-01-24T21:47:28Z

The changes make sense. I'm wondering about where to put this python file, should I keep it in devops/deploy? I'll make the spin-up-script call it to run continually when the spin up is done

zimmermatt · 2024-01-24T23:12:23Z

The changes make sense. I'm wondering about where to put this python file, should I keep it in devops/deploy? I'll make the spin-up-script call it to run continually when the spin up is done

devops/deploy seems a reasonable location

zimmermatt

I'll merge this and the one remaining comment can be addressed optionally/opportunistically.

Nghi Nguyen added 3 commits January 24, 2024 12:43

Create base supervisor script

e057e86

Add recover function

bb5f9a7

Fix small typo error

5d983e5

nghi01 linked an issue Jan 24, 2024 that may be closed by this pull request

Implement a supervisor procedure to monitor the health of the peer list and restore peers that have failed #24

Closed

zimmermatt requested changes Jan 24, 2024

View reviewed changes

Logging, shebang, finally block added

b1287a9

zimmermatt approved these changes Jan 27, 2024

View reviewed changes

zimmermatt merged commit 2833d5f into zimmermatt:main Jan 27, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create base supervisor script #26

Create base supervisor script #26

nghi01 commented Jan 24, 2024

zimmermatt Jan 24, 2024

nghi01 Jan 26, 2024

zimmermatt Jan 24, 2024

nghi01 Jan 24, 2024

nghi01 Jan 25, 2024

zimmermatt Jan 26, 2024

nghi01 Jan 26, 2024

zimmermatt Jan 24, 2024

zimmermatt Jan 27, 2024

zimmermatt Jan 24, 2024

nghi01 Jan 26, 2024

zimmermatt Jan 24, 2024

nghi01 Jan 26, 2024

zimmermatt Jan 27, 2024

nghi01 commented Jan 24, 2024

zimmermatt commented Jan 24, 2024

zimmermatt left a comment

Create base supervisor script #26

Create base supervisor script #26

Conversation

nghi01 commented Jan 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nghi01 commented Jan 24, 2024

zimmermatt commented Jan 24, 2024

zimmermatt left a comment

Choose a reason for hiding this comment