Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does Azure restart the host when auto-healing is switched off? #1292

Open
zabeen opened this issue Apr 23, 2024 · 4 comments
Open

Why does Azure restart the host when auto-healing is switched off? #1292

zabeen opened this issue Apr 23, 2024 · 4 comments
Labels
bug Something isn't working investigation performance-search Covers changes to improve performance specifically *of search requests*. wmda-live Issues identified from LIVE-WMDA-ATLAS

Comments

@zabeen
Copy link
Contributor

zabeen commented Apr 23, 2024

This is a question for azure support.

Auto-heal is only switched off for matching functions app.

It's possible that if an active atlas-functions app is running on the same host as the active matching-functions app (if by "host", app insights means VM instance) - then auto-heal may still be engaged?

Solution could be then to switch off auto-heal on atlas-functions - at least then an OOM exception may be thrown would make the point of failure clearer!

@zabeen zabeen added bug Something isn't working performance-search Covers changes to improve performance specifically *of search requests*. wmda-live Issues identified from LIVE-WMDA-ATLAS labels Apr 23, 2024
@zabeen zabeen moved this to Project Backlog in Atlas Development Apr 23, 2024
@zabeen
Copy link
Contributor Author

zabeen commented Apr 23, 2024

evidence from prod in just the last 24 hours - platform healing was performed on atlas-functions:

Image


From matching app - no restarts detected in same period:

Image

@zabeen
Copy link
Contributor Author

zabeen commented Apr 23, 2024

@SergeyEzh to switch off atlas-functions auto-heal manually on uat-wmda, and run set of large searches (from #1268 ), and see if instead of matching request replays, we actually see exceptions being thrown that are responsible for the host being shutdown.

@SergeyEzh SergeyEzh moved this from Project Backlog to In Progress in Atlas Development Apr 24, 2024
@SergeyEzh
Copy link
Collaborator

SergeyEzh commented Apr 24, 2024

@zabeen, turning off auto-heal on atlas-functions had no effect. There were messages in matching-request with DeliveryAttempt equals to 3, but there were no exceptions in the logs

@zabeen
Copy link
Contributor Author

zabeen commented Apr 24, 2024

@zabeen, turning off auto-heal on atlas-functions had no effect. There were messages in matching-request with DeliveryAttempt equals to 3, but there were no exceptions in the logs

I couldn't see any web app restarts reported for any of the functions apps in last 24 hours. It is not clear why then searches were replayed due to Host restart as reported in A.I. logs.

I think we need to involve Azure support to ask how we diagnose this problem.

@zabeen zabeen moved this from In Progress to Blocked in Atlas Development Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working investigation performance-search Covers changes to improve performance specifically *of search requests*. wmda-live Issues identified from LIVE-WMDA-ATLAS
Projects
Status: Blocked
Development

No branches or pull requests

2 participants