Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial HealthCheck Code #245

Closed
wants to merge 1 commit into from

Conversation

AronVietti
Copy link
Contributor

Many services, like AWS, do basic TCP Health Checks by connecting to the servers socket. UDP is not supported, so to make this work for Jamulus I created a HealthCheckSocket that handles connections on the same port as the UDP Service. To make that more meaningful it is on the same thread as the UDP socket. This could be expanded with more service checks in the future, but at the very least it will fail the HealthCheck if something is wrong on the thread.

To get this working I had to make both socket NonBlocking and use a select call to make sure the thread doesn't use CPU when there are no incoming messages or data.

Adds a TCP Socket class called CHealthCHeckSocket.
This allows services like AWS to connect
and verify the service is still running.

Added Health Check to CHighPriorityThread, only used when
bClient is false.

Created global value to control the max number of Health Check
connections.

Changed CSocket to be Non Blocking. Updated the Socket loop to
handle both sockets, and use select to block until there is
either a UDP datagram or a TCP Connection request.

Created socket error handling.

Added Socket helper functions to the Socket implementation.
These keep the implementation code cleaner than having the
windows specific ifdefs inline.
@corrados
Copy link
Contributor

What is a AWS Health Check used for? I have not heard oft such a service.

@AronVietti
Copy link
Contributor Author

https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html

Basically it let's the AWS infrastructure check that your services are healthy, and if they're not then create new ones and route to them. This is common practice for automating service availability.

@AronVietti
Copy link
Contributor Author

Google Cloud also supports the same thing. https://cloud.google.com/load-balancing/docs/health-checks

@corrados
Copy link
Contributor

What is the relation to Jamulus? What benefit has the Jamulus Server by this?

@AronVietti
Copy link
Contributor Author

So that you can run Jamulus Server on cloud infrastructure and get the benefits of high availability and integrated monitoring.

@corrados
Copy link
Contributor

What information about the Jamulus Server do you want to query with this interface?

@AronVietti
Copy link
Contributor Author

The Load Balancer needs to be able to make a TCP connection to the Jamulus Server as a way to monitor general health of the software, instance, and network. It does not query for anything, there is no request or response, it just establishes the connection and then closes it and tries again at a configured interval. It cannot do that using the existing UDP socket because UDP is connectionless. Thus adding a TCP socket to Jamulus and putting it in the same loop as the UDP socket. If something goes wrong on the thread and the socket cannot handle connections the Load Balancer will fail to connect and take the appropriate action.

@corrados
Copy link
Contributor

Do you run Jamulus in AWS in a normal Linux shell? Could you start other programs or scripts in parallel to the Jamulus Server?

@AronVietti
Copy link
Contributor Author

Yes, it is running under a normal Linux shell, so there is nothing preventing another program/script. This is often referred to as a "sidecar" and has some downsides. The largest being that you can't know that the health of the sidecar actually reflects the health of the service.

@corrados
Copy link
Contributor

The largest being that you can't know that the health of the sidecar actually reflects the health of the service.

What do you mean by that? What is the "service" in that case?

@AronVietti
Copy link
Contributor Author

Jamulus is the service.

@AronVietti
Copy link
Contributor Author

You seem reluctant about this, is this also the reason that the Html status is written to a file instead of responding to an Http request? I totally understand being hesitant about opening extra ports. That was part of the reason for making it not accept requests, only TCP connections.

@corrados
Copy link
Contributor

Ok, so is your health check then to check if Jamulus has crashed?

@AronVietti
Copy link
Contributor Author

Crashed, or if the control thread for the service is in a bad state. If a deadlock or some other bad state happened on that thread the program may not exit, but it will no longer be able to accept connections and the health check will fail.

@AronVietti
Copy link
Contributor Author

This is where future improvements could be made, where there is more monitoring code into the state of the service and if a bad state is entered the Health Check socket would get closed.

@corrados
Copy link
Contributor

From my experience, the Jamulus server runs very stable. E.g. the Central Server which is occupied by a lot of people, ran for months without a restart. So I do not expect it to crash or having a bad state. If it is just about making the AWS health checking server happy, a simple python script could do the job. Do you agree?

@AronVietti
Copy link
Contributor Author

A script could do the job, but with reduced reliability. It's good that it's stable, but again the problem with a script or separate program is that they may fail independent of Jamulus. Checking that the process is still running is a poor solution. So then we move to polling Jamulus using one of the protocol calls, like the Ping. Now the script has to be synchronized to the rate of the health check, so it can be sure it's reporting the correct health. This leads to the realization that to be accurate you need to poll at a higher rate than the Health Check requests. And now we're adding unnecessary load to a real time service, by making constant requests to Jamulus. Though I admit this only becomes a problem when you have many connections and are reaching the limits of the hardware it is being run on.

@corrados
Copy link
Contributor

Checking that the process is still running is a poor solution.

First, my assumption is that Jamulus does not crash :-).

Checking that the process is still running is a poor solution.

Why is that supposed to be a poor solution?

And now we're adding unnecessary load to a real time service, by making constant requests to Jamulus.

Well, if you really do not trust the stability of the Jamulus server and you think checking the process is poor, sending ping times does not influence the stability of the Jamulus audio streams. Each client constantly sends ping messages. The Central Server even receives a lot of ping messages from all around the world if anyone opens the server list. I do not see an issue with that.

@AronVietti
Copy link
Contributor Author

So, just checking for the process is considered a poor solution because of things like deadlocks, high cpu and memory pressure that can all cause programs to be unresponsive without crashing. Just checking those things is also considered bad, because there is no way to know if 100% cpu means a service is designed to run that way, or if for example it's in an infinite loop.

Sending pings from another process, running on the same or even a different server, brings back the problem of not knowing if there is a problem with that process or with Jamulus.

Under heavy load, or running in a constrained environment, the processing of ping times would influence the stability and performance of the service, because the system recourses are finite. But I agree that this is the least important, and is an extreme edge case.

If this is a feature you don't see a use for, I'll just maintain it on my fork. Is there a developer forum for contributing, so i can ask in the future or get assigned bug fixes/feature request? I'd like to help out now that I've familiarized myself with the code base.

@corrados
Copy link
Contributor

If this is a feature you don't see a use for, I'll just maintain it on my fork.

Actually, yes. If that is ok for you it would be good to maintain it on your fork for now.

Is there a developer forum for contributing, so i can ask in the future or get assigned bug fixes/feature request?

Sure. The bug and feature requests are here: https://github.com/corrados/jamulus/issues
Discussion forums are here: https://sourceforge.net/p/llcon/discussion

If you plan to implement something for Jamulus, it is always advantageous to inform me prior to starting coding and first discuss the specifications of the new feature. Preferable by creating a feature request Issue.

Here is a list of possible Issues you could work on if you like:
#169
#129
#116
#83
#73
#49
#40
#23

@AronVietti
Copy link
Contributor Author

I had looked on the forums, but didn't see a developer forum, do you just use General? To work on an issue should I just volunteer in the issue conversation and wait to be assigned?

Thanks for the list, I'll go through these and choose one. Going to close this PR now.

@AronVietti AronVietti closed this May 18, 2020
@corrados
Copy link
Contributor

I had looked on the forums, but didn't see a developer forum, do you just use General?

Good idea. I just created one: https://sourceforge.net/p/llcon/discussion/developerforum

To work on an issue should I just volunteer in the issue conversation and wait to be assigned?

Just write in the Issue that you are interested in taking care of it. Then we can discuss the details in that Issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants