-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial HealthCheck Code #245
Conversation
Adds a TCP Socket class called CHealthCHeckSocket. This allows services like AWS to connect and verify the service is still running. Added Health Check to CHighPriorityThread, only used when bClient is false. Created global value to control the max number of Health Check connections. Changed CSocket to be Non Blocking. Updated the Socket loop to handle both sockets, and use select to block until there is either a UDP datagram or a TCP Connection request. Created socket error handling. Added Socket helper functions to the Socket implementation. These keep the implementation code cleaner than having the windows specific ifdefs inline.
What is a AWS Health Check used for? I have not heard oft such a service. |
https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html Basically it let's the AWS infrastructure check that your services are healthy, and if they're not then create new ones and route to them. This is common practice for automating service availability. |
Google Cloud also supports the same thing. https://cloud.google.com/load-balancing/docs/health-checks |
What is the relation to Jamulus? What benefit has the Jamulus Server by this? |
So that you can run Jamulus Server on cloud infrastructure and get the benefits of high availability and integrated monitoring. |
What information about the Jamulus Server do you want to query with this interface? |
The Load Balancer needs to be able to make a TCP connection to the Jamulus Server as a way to monitor general health of the software, instance, and network. It does not query for anything, there is no request or response, it just establishes the connection and then closes it and tries again at a configured interval. It cannot do that using the existing UDP socket because UDP is connectionless. Thus adding a TCP socket to Jamulus and putting it in the same loop as the UDP socket. If something goes wrong on the thread and the socket cannot handle connections the Load Balancer will fail to connect and take the appropriate action. |
Do you run Jamulus in AWS in a normal Linux shell? Could you start other programs or scripts in parallel to the Jamulus Server? |
Yes, it is running under a normal Linux shell, so there is nothing preventing another program/script. This is often referred to as a "sidecar" and has some downsides. The largest being that you can't know that the health of the sidecar actually reflects the health of the service. |
What do you mean by that? What is the "service" in that case? |
Jamulus is the service. |
You seem reluctant about this, is this also the reason that the Html status is written to a file instead of responding to an Http request? I totally understand being hesitant about opening extra ports. That was part of the reason for making it not accept requests, only TCP connections. |
Ok, so is your health check then to check if Jamulus has crashed? |
Crashed, or if the control thread for the service is in a bad state. If a deadlock or some other bad state happened on that thread the program may not exit, but it will no longer be able to accept connections and the health check will fail. |
This is where future improvements could be made, where there is more monitoring code into the state of the service and if a bad state is entered the Health Check socket would get closed. |
From my experience, the Jamulus server runs very stable. E.g. the Central Server which is occupied by a lot of people, ran for months without a restart. So I do not expect it to crash or having a bad state. If it is just about making the AWS health checking server happy, a simple python script could do the job. Do you agree? |
A script could do the job, but with reduced reliability. It's good that it's stable, but again the problem with a script or separate program is that they may fail independent of Jamulus. Checking that the process is still running is a poor solution. So then we move to polling Jamulus using one of the protocol calls, like the Ping. Now the script has to be synchronized to the rate of the health check, so it can be sure it's reporting the correct health. This leads to the realization that to be accurate you need to poll at a higher rate than the Health Check requests. And now we're adding unnecessary load to a real time service, by making constant requests to Jamulus. Though I admit this only becomes a problem when you have many connections and are reaching the limits of the hardware it is being run on. |
First, my assumption is that Jamulus does not crash :-).
Why is that supposed to be a poor solution?
Well, if you really do not trust the stability of the Jamulus server and you think checking the process is poor, sending ping times does not influence the stability of the Jamulus audio streams. Each client constantly sends ping messages. The Central Server even receives a lot of ping messages from all around the world if anyone opens the server list. I do not see an issue with that. |
So, just checking for the process is considered a poor solution because of things like deadlocks, high cpu and memory pressure that can all cause programs to be unresponsive without crashing. Just checking those things is also considered bad, because there is no way to know if 100% cpu means a service is designed to run that way, or if for example it's in an infinite loop. Sending pings from another process, running on the same or even a different server, brings back the problem of not knowing if there is a problem with that process or with Jamulus. Under heavy load, or running in a constrained environment, the processing of ping times would influence the stability and performance of the service, because the system recourses are finite. But I agree that this is the least important, and is an extreme edge case. If this is a feature you don't see a use for, I'll just maintain it on my fork. Is there a developer forum for contributing, so i can ask in the future or get assigned bug fixes/feature request? I'd like to help out now that I've familiarized myself with the code base. |
Actually, yes. If that is ok for you it would be good to maintain it on your fork for now.
Sure. The bug and feature requests are here: https://github.com/corrados/jamulus/issues If you plan to implement something for Jamulus, it is always advantageous to inform me prior to starting coding and first discuss the specifications of the new feature. Preferable by creating a feature request Issue. Here is a list of possible Issues you could work on if you like: |
I had looked on the forums, but didn't see a developer forum, do you just use General? To work on an issue should I just volunteer in the issue conversation and wait to be assigned? Thanks for the list, I'll go through these and choose one. Going to close this PR now. |
Good idea. I just created one: https://sourceforge.net/p/llcon/discussion/developerforum
Just write in the Issue that you are interested in taking care of it. Then we can discuss the details in that Issue. |
Many services, like AWS, do basic TCP Health Checks by connecting to the servers socket. UDP is not supported, so to make this work for Jamulus I created a HealthCheckSocket that handles connections on the same port as the UDP Service. To make that more meaningful it is on the same thread as the UDP socket. This could be expanded with more service checks in the future, but at the very least it will fail the HealthCheck if something is wrong on the thread.
To get this working I had to make both socket NonBlocking and use a select call to make sure the thread doesn't use CPU when there are no incoming messages or data.