Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolate operator from http service misconfiguration - Use internal service #5211

Merged

Conversation

naemono
Copy link
Contributor

@naemono naemono commented Dec 27, 2021

Since #5005 has been open, with active discussion for quite some time, I thought it would be useful to go back to the beginning and look at what we were trying to accomplish in the original issue #4394, which was ensuring that the user can't misconfigure the operator so it can't manage an Elasticsearch cluster. In the original PR, we chose to go the route of choosing a random pod for communication between the operator, and the ES cluster, which caused its own set of concerns once we both saw the implementation, and tested the consequences of the implementation.

This PR shows the changes that would be applied to the operator if we chose the other route, which was managing an internal service that does not inherit the selector defined within the custom resources's spec. This will, at least, give us more information to make a better decision about which route we believe is best to move either solution forward to resolving the initial issue.

To be clear, only one of #5005, or this issue (or none if we so choose) will be merged. The other will be closed as is.

resolves #4394

… use the internally managed

  service, which does not apply the spec's selector defined within the CRD when creating the service to
  avoid a user misconfiguring the selectors.
Added a unit test to ensure that the service doesn't inherit it's CRD spec's selector.
@naemono naemono added the >enhancement Enhancement of existing functionality label Dec 27, 2021
@naemono
Copy link
Contributor Author

naemono commented Dec 27, 2021

Question for discussion: In this solution, would we want to only target the master nodes, instead of all nodes?

@naemono
Copy link
Contributor Author

naemono commented Jan 10, 2022

From a discussion with members of the Elasticsearch team, they note that these ES calls optimally should not be directed at the masters:

"/_cat/shards?format=json"
"/_flush/synced"
"/_flush"
"/_nodes/_all/no-metrics"
"/_nodes/_all/stats/os"

relevant information from discussion

cat APIs collect data inefficiently, putting extra burden on coordinator.
Also, the stats APIs can have some overhead on the coordinator.
Avoiding the master as coordinator could thus be advisable.
In particular when orchestrating changes, since failures here are typically much worse than elsewhere.
The _flush calls also fan out to all shards and collect responses, could have some overhead in case of many failures.

@naemono
Copy link
Contributor Author

naemono commented Jan 10, 2022

run full pr build

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just one thing ...

pkg/controller/elasticsearch/services/services.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/controller/elasticsearch/services/services.go Outdated Show resolved Hide resolved
pkg/controller/autoscaling/elasticsearch/driver.go Outdated Show resolved Hide resolved
@naemono naemono merged commit 0b1de89 into elastic:main Jan 20, 2022
@pebrc pebrc added the v2.1.0 label Jan 24, 2022
fantapsody pushed a commit to fantapsody/cloud-on-k8s that referenced this pull request Feb 7, 2023
…rvice (elastic#5211)

* When constructing the Elasticsearch client for internal operator use, use the internally managed
  service, which does not apply the spec's selector defined within the CRD when creating the service to
  avoid a user misconfiguring the selectors.
Added a unit test to ensure that the service doesn't inherit it's CRD spec's selector.

* Set all aspects of internal service explicitly.
Update comments

* remove setting ipfamilies

* Review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality v2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Isolate the operator from misconfiguration of HTTP service
3 participants