-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create base supervisor script #26
Conversation
devops/deploy/supervisor.py
Outdated
async def ping_node(host, port): | ||
try: | ||
reader, writer = await asyncio.open_connection(host, port) | ||
print(f"Alive node at {host}:{port}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
devops/deploy/supervisor.py
Outdated
writer.close() | ||
await writer.wait_closed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should maybe be in a finally
block.
Does the reader
need to be closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the asyncio python doc doesn't seem so. I'd look further into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarification:
Both the reader and finally block were not included in the example from the python docs so I did not include them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... the docs don't provide clear guidance on cleanup in the face of exceptions. Would be nice if they did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot find any instance where they try to close the reader, so I assume it is not worth doing so.
devops/deploy/supervisor.py
Outdated
async def ping_node(host, port): | ||
try: | ||
reader, writer = await asyncio.open_connection(host, port) | ||
print(f"Alive node at {host}:{port}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be good enough for our MVP, but merely being able to connect on the port at the host is not a guarantee that it's our application listening at that moment in time.
A stronger signal would be to have a "healthcheck" ping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little more on this beyond the above and our conversation.
In a deployment environment for something we want to ensure stays up, we'd usually use (in increasing preference and not an exhaustive list) one of:
- a monitor script with
cron
- a supervising tool such as
daemontools
orsupervisord
systemd
Determining "liveness" of a service via a network call is somewhat fragile since the network path from the monitoring app could have a partition, but other apps can reach it just fine. The more degenerate case is that the monitoring app can reach it, but the apps that depend on it can't. A more refined approach is to look at client app calls for signs of health (assuming a reasonable degree of traffic).
devops/deploy/supervisor.py
Outdated
writer.close() | ||
await writer.wait_closed() | ||
except (OSError, asyncio.TimeoutError): | ||
print(f"Node at {host}:{port} is not responding") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
while True: | ||
for i in range(len(peer_list)): | ||
await ping_node("127.0.0.1", peer_list[i][1]) | ||
await asyncio.sleep(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making this an input parameter to the supervisor script that's passed through to here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
U mean the 127.0.0.1 as an input parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sleep interval.
The changes make sense. I'm wondering about where to put this python file, should I keep it in devops/deploy? I'll make the spin-up-script call it to run continually when the spin up is done |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge this and the one remaining comment can be addressed optionally/opportunistically.
Currently still need to implement recovering nodes, need to talk with Alex about implementing recovering nodes from the available public keys and private keys.