-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
windows service health check #10637
Comments
Hi @rismoney, It sounds like there's a service with no network listeners that you want to have health checks on. Rather than using a If I've understood correctly (let me know), I have a few follow-up questions:
|
Yes.
This is what we would use. I am running very latency sensitive financial transaction applications. Spawning to cmd/powershell would get expensive as the service count were to rise invoking a get-service/sc or similar operation. We are trying to do service inventory on 500 nodes with 25-50 custom services per node, and would not want the health check to be shell/script based with another process invocation. Also my understanding is any approach that uses wmi calls isn't going to scale well (the prometheus exporter kind of taught that lesson)
Yes, unfortunately the only thing we will know is it's name and that it's running so those two attributes are nearly all we have. That itself would be the health check. I know it's kind of primitive, The applications with listeners are also fairly sensitive, and we don't want to interrogate them by port. They have their own built in health capabilities, that essentially will down themselves if unhealthy.
I think it's fairly boolean. running or not. I suppose there are some edge cases around "starting, stopping, etc) but I'd punt on those, and assume anything not running is unhealthy. Those are typically backoff timer based, and will timeout/succeed eventually. While my initial use case is windows, I would imagine similar capabilities for systemd services would be a feature on parity with this. I am relatively new to exploring consul, and thought this capability might open up a huge monitoring integration for us, where we can dynamically determine how to monitor things based on what services are active where in the environment. If we had to powershell/sc to get it for each service it would potentially be super expensive as the service count rises. |
Are all of the 25-50 services per node unique services? Or do you have multiple instances of some services on a single node? I ask because if service name is the identifying information used for the health check, I imagine that would be problematic with multiple instances of a service on a single node. |
The services are uniquely named and require a directory with the associated app/config in each dir. They could be the same app, but their config/name is unique. I am not sure windows even allows duplicate service names on the same node. To clarify 2 services named: The underlying appcode might be the same, but are redundantly stored on the filesystem and have different associated conf files. At no point would the service names not be unique. Also I don't think you can have 2 windows services that point to the same path\to\filename.exe ... So there is that too. So duplication of services wouldn't occur on the same node, but could occur across nodes (parallel stuff). So myco-svctype-svcname-001 could be running on 3 nodes. I hope that clarifies. |
Hello Consul community members, We would welcome a PR contributed by the community for this enhancement! If you're interested, please comment here so anyone interested can stay informed. We also recommend sharing a written design approach here before proceeding with implementation to ensure we're aligned from the start. The approach should ensure the following:
Implementing automated tests for this may not be straightforward, as (to the best of my knowledge) we don't have unit tests that run specifically on Windows within Circle CI right now. The HashiCorp team can discuss that with whomever takes on this enhancement. |
Hi there, I bumped into this issue and I would like to contribute. That said, I wanted to try something "odd", considering that this issue requires sharing a design beforehand. https://gist.github.com/deblasis/b6dd645a8bd5e2b3c9b95bb30bbb238c Any feedback is more than welcome. Cheers |
Hi @deblasis, Thank you so much for writing this up - and for taking the initiative to find and apply our RFC template! The engineering team will take a look soon and provide feedback directly on the Gist (likely in the next 1-2 weeks). In the meantime, I might provide some feedback on UX considerations (also on the Gist). |
Hi @jkirschner-hashicorp, Thank you for the update. It sounds fantastic! Take your time. |
The team has blocked off some time next week to go through your RFC! We appreciate this initiative and will get back to you with valuable feedback |
any update or blockers on moving this forward? |
Hi @rismoney, it is completely my fault I guess. The team provided feedback, we are aligned with the design and there are no blockers. |
Hi folks, I have also written some code in #13388 and I would like some feedback and some direction regarding your preferred way of mocking syscalls. I could attempt doing it "my own way", but I'd rather stay consistent if you guys have a pattern that's already used elsewhere and would like to be replicated here. |
amazing work, from original documentation to merging!!! kudos and @deblasis would be an asset to any org! |
Thank you @rismoney ! In hindsight, it took me way too long to stay on top of this given work commitments and I hate unfinished business but I am very happy to see my code finally merged upstream again. Always nice to be working with you guys, I always learn something new🙏 Farewell, until next time :) |
Agents should have a native health check to determine if a service is running. Many services dont have network listeners and handle either outbound, or local service capability. Avoiding process creation/kill of calling a cmd sc/pwsh cmdlet to check the existence of a service would be great to just handle in native go.
I cant imagine this to be a big ask, as other health checks are seemingly simplistic functions.
The text was updated successfully, but these errors were encountered: