-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unhealthy Compactors Stay in the Ring #142
Comments
If you are seeing this issue and are unable to successfully forget a compactor it is recommended to click the "Forget" button, wait a full 10 seconds, stand up, stretch, get all your grocery shopping done, come back and then hit F5. The compactor should be forgotten. If you quickly spam "Forget" then old compactors seem to stay in the ring. This is believed to be an issue with the memberlist propagation of the ring. |
Forget behaviour may be fixed with cortexproject/cortex#3603 and will reflect once we vendor in the latest cortex version. Research for a way to not-care about a compactor disappearing from the ring. |
Can this be closed? |
We believe that #442 fixed this issue, but have not seen it in again in our internal cluster to confirm. I'd rather keep this open until we verify. |
This happened again, could not get the unhealthy compactor to leave, ended up port-forwarding 4-5 compactors between two people and clicking forget a lot and eventually it went away. |
@pstibrany has reported he feels the issue will be fixed in Cortex 1.7.0. We will keep an eye on it after the upgrade. |
It's the same cortexproject/cortex#3603 fix, but Tempo currently doesn't use Cortex version with that fix in. |
Confirmed fixed in our environment by #512 Thanks @pstibrany! |
We've seen this again, but found a way to mitigate. The changes in Cortex have certainly made it easier to deal with, but it does still happen occassionally. Details have been added to the appropriate runbook entries: #532 |
Further updates on this. We have since switched to using this values in our memberlist config and have not been able to trigger this issue since:
Still keeping an eye on things. |
Possible fixes going into Cortex now: cortexproject/cortex#4420 TODO:
|
I've seen an unhealthy compactor stay in the ring for hours after it was gone. Research this and see if it's a matter of configuration or actually a bug of some kind.
Should we use a simpler discovery mechanism like DNS?
The text was updated successfully, but these errors were encountered: