-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Isolate the operator from misconfiguration of HTTP service #4394
Comments
@sebgl suggested to route requests to individual Pods directly as we are already doing when we are switching protocols (TLS/non-TLS). Instead of this manual routing being the exception we could make it the default and step around accidental misconfiguration this way. |
Suggestion for approach to take between these 2 options:
I'd like to suggest an internal service managed by operator for the following reasons:
This still doesn't remove the possibility of someone doing bad things to the internal service, but it seems to alleviate the initial concern of customer accidentally mis-configuring the operator into a situation it can't get out of. Thoughts? |
I think that would be a fairly straightforward inspection of the Pods Status subresource wouldn't it?
I am not sure I follow that line of argument. As we are using a cached client and we are accessing Pods in other places in the operator we are already incurring the cost of keeping that cache in sync. So the additional listing of the Pods for the purpose of finding one we can talk to would be essentially "free" |
I hadn't examined where we interact with pods already in the code base. Let me take a look and get back to you on this. |
Ok, I believe I found where you're referring to. Is this the section https://github.com/elastic/cloud-on-k8s/blob/master/pkg/controller/elasticsearch/services/services.go#L149-L155 ? If so, yeah, I see how just leveraging that would make the most sense. I don't see any filtering of ready pods though. Is that something we should include during this change? |
Nevermind on the filtering piece, I just ran into this https://github.com/elastic/cloud-on-k8s/blob/master/pkg/controller/elasticsearch/driver/driver.go#L312. 👍 |
Related #3732 and #2572
Currently it is possible for users to misconfigure the default HTTP service in such a way the ability of the operator to roll out further configuration changes is impeded.
Consider:
There two ways for users to fix this misconfiguration: a) either change the service selector to
client-node
or b) change the labels on theclient-node
nodeSet tocoordinating-node
. Approach a) would work while b) would notOnce the user has rolled out the change with the incorrect selector in the service, the operator has only limited ability to roll out further changes because it lost connectivity with Elasticsearch. In such a situation any rolling upgrade of existing nodes is effectively blocked while new nodes can still be added to the cluster. Also any updates to the service would be rolled out. This is because these operations don't require API access to Elasticsearch while a rolling upgrade or a downscale does require API access.
We could consider isolating the operator from this class of configuration errors by either not relying on the service (talking to Pods directly as we already when we switch form HTTP to HTTPS ) or creating an internal service that users cannot configure that is used by the operator.
The text was updated successfully, but these errors were encountered: