Add new operator flag to control Elasticsearch health observation intervals #5861

pebrc · 2022-07-06T09:39:32Z

Introduce new global flag to conrol Elasticsearch health observation interval.

annotations on individual Elasticsearch resources take precedence to avoid breaking existing customisations for users
a non-positve value disables asynchronous observation completely. Only one synchronous observation happens during reconciliation. This disables also timely automatic pod disruption budget adjustment on health changes. As a side effect of client-go cache refreshes observations still happen every 10 hours due to reconciliation. This means disabling the asynchronous observation has the same effect as setting the observation interval to 10 hours.

…ervals

thbkrkr · 2022-07-06T12:39:19Z

I understand that we need to do something if the provided interval is negative but I'm not sure that we should disable the observer. What does it mean to manage an ES for ECK without an observer? Isnt it dangerous to reconcile ES without knowing the real health?
How about validating that the provided interval is in a range that makes sense (5s < i < 1h) and crashing the operator if it's not.

pebrc · 2022-07-06T13:25:28Z

What does it mean to manage an ES for ECK without an observer? Isnt it dangerous to reconcile ES without knowing the real health?

I think you are right in general that disabling the observer is maybe a step too far. At least without compensating for it by having a synchronous observation in the reconcilation loop. I forgot that the synchronous observation only happens when the observer is first constructed which only happens on settings changes (e.g. certificate changed or similar)

However I also think that the notion of "real health" is flawed. We are alway working with a health observation that is by default up 10 seconds old and if Elasticsearch is slow to respond potentially even older. So adjusting the observation interval just moves the needle on the staleness scale from at worst 10 seconds to maybe 10 hours stale.

I am moving this back to draft mode to see if I can come up with a solution and also address the negative value issue for the annotation.

…ned off

pebrc · 2022-07-07T07:45:32Z

@thbkrkr I have made it so that what I wrote in the OP is now true: when asynchronous observation is diabled a synchronous observation is made on each reconciliation. This stil has some drawbacks as the operator cannot react to changes in Elasticsearch health but at least each reconciliation is working with non-stale health data when it happens.

But I think I am also open to going back to your idea of simply validating a positive interval.

thbkrkr

The behaviour looks good to me.
I left some minor comments on names and constants.

pkg/controller/elasticsearch/observer/manager_test.go

pkg/controller/elasticsearch/observer/observer.go

pkg/controller/elasticsearch/observer/manager.go

pkg/controller/elasticsearch/observer/observer.go

cmd/manager/main.go

pkg/controller/elasticsearch/observer/observer.go

…ervals (elastic#5861) Annotations on individual Elasticsearch resources take precedence to avoid breaking existing customisations for users. A non-positve value disables asynchronous observation completely. Only one synchronous observation happens during reconciliation. This disables also timely automatic pod disruption budget adjustment on health changes. As a side effect of client-go cache refreshes observations still happen every 10 hours due to reconciliation. This means disabling the asynchronous observation has the same effect as setting the observation interval to 10 hours.

Add new operator flag to control Elasticsearch health observation int…

2033eaa

…ervals

pebrc added >enhancement Enhancement of existing functionality v2.4.0 labels Jul 6, 2022

pebrc marked this pull request as draft July 6, 2022 13:28

Ensure a synchronous observation is made if the async observer is tur…

66d1d12

…ned off

pebrc marked this pull request as ready for review July 7, 2022 07:43

pebrc added 2 commits July 7, 2022 10:21

remove unused args

34a3fd6

lint

24667cd

thbkrkr reviewed Jul 11, 2022

View reviewed changes

review feedback

88b4db0

thbkrkr approved these changes Jul 11, 2022

View reviewed changes

pebrc merged commit a228626 into elastic:main Jul 18, 2022

naemono mentioned this pull request Aug 31, 2022

Expose Elasticsearch Observation Interval in Helm Chart #5988

Closed

pebrc mentioned this pull request Jun 14, 2023

Missing documentation for observation interval #6906

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new operator flag to control Elasticsearch health observation intervals #5861

Add new operator flag to control Elasticsearch health observation intervals #5861

pebrc commented Jul 6, 2022 •

edited

Loading

thbkrkr commented Jul 6, 2022

pebrc commented Jul 6, 2022 •

edited

Loading

pebrc commented Jul 7, 2022 •

edited

Loading

thbkrkr left a comment

Add new operator flag to control Elasticsearch health observation intervals #5861

Add new operator flag to control Elasticsearch health observation intervals #5861

Conversation

pebrc commented Jul 6, 2022 • edited Loading

thbkrkr commented Jul 6, 2022

pebrc commented Jul 6, 2022 • edited Loading

pebrc commented Jul 7, 2022 • edited Loading

thbkrkr left a comment

Choose a reason for hiding this comment

pebrc commented Jul 6, 2022 •

edited

Loading

pebrc commented Jul 6, 2022 •

edited

Loading

pebrc commented Jul 7, 2022 •

edited

Loading