Elastic APM Server: Set status.ObservedGeneration from metadata.Generation #5470

naemono · 2022-03-14T13:16:01Z

related to #3392
related to #5331

This change allows status.ObservedGeneration to be kept in sync with metadata.Generation according to the rules described in the above issue. This change adds a set of unit tests which show more details into when observedGeneration will, and will not be updated according to any errors that may occur during reconciliation. End to end tests were adjusted to always check for observedGeneration increments during any mutation changes.

Adjust reconciler to always attempt a status update, even on error. Add generation check to e2e APMServer tests.

Update tests for new behavior.

…dgeneration increment. Debugging e2e tests

naemono · 2022-03-14T19:32:11Z

run full pr build

naemono · 2022-03-15T14:26:25Z

Fyi, e2e tests are failing with new 7-8x upgrade test dealing with mutations associated with these changes, and some additional issues associated with missing roles. If things continue to fail, I'll change the mutation test to not upgrade, but just adjust some labels or something to speed this up.

…er has needed permissions before attempting to ingest events.

naemono · 2022-03-22T15:34:41Z

run full pr build

pkg/controller/apmserver/controller.go

pkg/controller/apmserver/controller_test.go

test/e2e/test/apmserver/checks_apm.go

Remove unneeded context from test

naemono · 2022-04-20T15:21:34Z

run/e2e-tests

naemono · 2022-04-20T19:16:06Z

run/e2e-tests

naemono · 2022-04-24T18:30:47Z

jenkins test this please

naemono · 2022-04-25T19:17:04Z

run/e2e-tests

… index.

…ain)

naemono · 2022-04-26T22:03:39Z

run/e2e-tests

naemono · 2022-04-27T13:34:13Z

I can't reproduce e2e test failure for TestMutationAndReversal/* locally, or when running e2e test fully within a test cluster, and this test has nothing to do with this PR, so I'm going to assume this is a transient failure, and just re-run the test with 🤞 ...

naemono · 2022-04-27T13:34:20Z

run/e2e-tests

barkbay · 2022-04-28T06:32:24Z

pkg/apis/apm/v1/apmserver_types.go

+	// It corresponds to the metadata generation, which is updated on mutation by the API Server.
+	// If the generation observed in status diverges from the generation in metadata, the APM Server
+	// controller has not yet processed the changes contained in the APM Server specification.
+	// +kubebuilder:validation:Optional


I don't think I ever noticed it, but there are 2 ways to specify a field as optional: +kubebuilder:validation:Optional and +optional. Only the former seems to be documented.

pkg/controller/apmserver/controller.go

barkbay · 2022-04-28T08:18:08Z

test/e2e/test/apmserver/checks_apm.go

+// non-existing indexes (ES >= 8.x). This fixes a transient issue that happens when upgrading
+// Elasticsearch between major versions where it takes a bit of time to transition between
+// file-based user roles, and permissions errors were being returned by Elasticsearch.
+func (c *apmClusterChecks) CheckIndexCreation(apm apmv1.ApmServer, k *test.K8sClient) test.Step {


This test is still failing when run locally (TestAPMServerVersionUpgradeToLatest8x from 7.17.1 to 8.1.1)
Looking at the APM Server logs I can see 2 security related errors at the beginning, then I think the test is able to create the events, but it still eventually fails.

{"log.level":"error","@timestamp":"2022-04-28T08:03:44.099Z","log.logger":"modelindexer","log.origin":{"file.name":"modelindexer/indexer.go","file.line":395},"message":"failed to index event (security_exception): action [indices:admin/auto_create] is unauthorized for user [e2e-mercury-apmserver-upgrade-49wz-apm-user] with roles [apm_system,eck_apm_user_role_v75,ingest_admin] on indices [logs-apm.error-default], this action is granted by the index privileges [auto_configure,create_index,manage,all]","service.name":"apm-server","ecs.version":"1.6.0"} {"log.level":"error","@timestamp":"2022-04-28T08:03:44.099Z","log.logger":"modelindexer","log.origin":{"file.name":"modelindexer/indexer.go","file.line":395},"message":"failed to index event (security_exception): action [indices:admin/auto_create] is unauthorized for user [e2e-mercury-apmserver-upgrade-49wz-apm-user] with roles [apm_system,eck_apm_user_role_v75,ingest_admin] on indices [metrics-apm.app.1234_service_12a3-default], this action is granted by the index privileges [auto_configure,create_index,manage,all]","service.name":"apm-server","ecs.version":"1.6.0"}

=== RUN TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch#01 Retries (5m0s timeout): . step.go:43: Error Trace: utils.go:88 Error: Received unexpected error: timeout reached after 5m0s Test: TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch#01 {"log.level":"error","@timestamp":"2022-04-28T10:08:40.998+0200","message":"stopping early","service.version":"0.0.0-SNAPSHOT+00000000","service.type":"eck","ecs.version":"1.4.0","error":"test failure","error.stack_trace":"github.com/elastic/cloud-on-k8s/test/e2e/test.StepList.RunSequential\n\t/Users/michael/go/src/github.com/elastic/cloud-on-k8s/test/e2e/test/step.go:44\ngithub.com/elastic/cloud-on-k8s/test/e2e/test.RunMutations\n\t/Users/michael/go/src/github.com/elastic/cloud-on-k8s/test/e2e/test/run_mutation.go:39\ngithub.com/elastic/cloud-on-k8s/test/e2e/apm.TestAPMServerVersionUpgradeToLatest8x\n\t/Users/michael/go/src/github.com/elastic/cloud-on-k8s/test/e2e/apm/upgrade_test.go:32\ntesting.tRunner\n\t/usr/local/go/src/testing/testing.go:1439"} --- FAIL: TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch#01 (300.00s)

I'm still trying to understand if I'm doing something wrong...

barkbay · 2022-04-28T09:10:56Z

test/e2e/test/apmserver/checks_apm.go

-				metricIndex = "metrics-apm.app.1234_service_12a3-default"
-				errorIndex = "logs-apm.error-default"
-			}
+	return retry.UntilSuccess(func() error {


I'm not sure to understand why we are retrying during 5 minutes? What external event would unlock this situation? My understanding is that if assertCountIndexEqual fails, it is likely because checkEventsAPI (c.apmClient.IntakeV2Events) has also failed?

Hopefully this update will show what's going on in some detail.

If I move the retry around this block and add some logging, we can see what failures we're working with

return retry.UntilSuccess(func() error { if err := c.checkEventsAPI(apm); err != nil { fmt.Printf("checkEventsApi failure: %s\n", err) return err } if err := c.checkEventsInElasticsearch(apm, k); err != nil { fmt.Printf("checkEventsInElasticsearch failure: %s\n", err) return err } return nil }, 5*time.Minute, 10*time.Second)

With this change, here's the result. It fails before even attempting the version upgrade.

1st attempt: The messages are sent to apmserver successfully, but they haven't made it all the way to ES yet, as the metric index hasn't even been created yet.
2nd attempt: Before we attempt to send another message, we count the messages in the index (metric) and find 1 (which likely is the first message we attempted to send. Messages sent successfully to apmserver again, but we are expecting 2 messages in the index, but only find one (again).
3rd attempt: This process keeps repeating.

Seems like we aren't waiting long enough for the message to get from apm -> ES.

EventsInElasticsearch failure: elasticsearch client failed for https://apmserver-upgrade-rxzf-es-internal-http.e2e-mercury.svc:9200/apm-7.17.1-metric-2017.05.30/_count: 404 Not Found: {Status:404 Error:{CausedBy:{Reason: Type:} Reason:no such index [apm-7.17.1-metric-2017.05.30] Type:index_not_found_exception RootCause:[{Reason:no such index [apm-7.17.1-metric-2017.05.30] Type:index_not_found_exception}]}}\n"} {"Time":"2022-04-28T10:44:56.167184-05:00","Action":"output","Package":"github.com/elastic/cloud-on-k8s/test/e2e/apm","Test":"TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch","Output":"checkEventsInElasticsearch failure: 2 documents expected in index apm-7.17.1-metric-2017.05.30, got 1 instead\n"} {"Time":"2022-04-28T10:45:06.47493-05:00","Action":"output","Package":"github.com/elastic/cloud-on-k8s/test/e2e/apm","Test":"TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch","Output":"checkEventsInElasticsearch failure: 3 documents expected in index apm-7.17.1-metric-2017.05.30, got 2 instead\n"} {"Time":"2022-04-28T10:45:16.885698-05:00","Action":"output","Package":"github.com/elastic/cloud-on-k8s/test/e2e/apm","Test":"TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch","Output":"checkEventsInElasticsearch failure: 4 documents expected in index apm-7.17.1-metric-2017.05.30, got 3 instead\n"} {"Time":"2022-04-28T10:45:27.190003-05:00","Action":"output","Package":"github.com/elastic/cloud-on-k8s/test/e2e/apm","Test":"TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch","Output":"checkEventsInElasticsearch failure: 5 documents expected in index apm-7.17.1-metric-2017.05.30, got 4 instead\n"} {"Time":"2022-04-28T10:45:37.492295-05:00","Action":"output","Package":"github.com/elastic/cloud-on-k8s/test/e2e/apm","Test":"TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch","Output":"checkEventsInElasticsearch failure: 6 documents expected in index apm-7.17.1-metric-2017.05.30, got 5 instead\n"} {"Time":"2022-04-28T10:45:47.799507-05:00","Action":"output","Package":"github.com/elastic/cloud-on-k8s/test/e2e/apm","Test":"TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch","Output":"checkEventsInElasticsearch failure: 7 documents expected in index apm-7.17.1-metric-2017.05.30, got 6 instead\n"} {"Time":"2022-04-28T10:45:58.102207-05:00","Action":"output","Package":"github.com/elastic/cloud-on-k8s/test/e2e/apm","Test":"TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch","Output":"checkEventsInElasticsearch failure: 8 documents expected in index apm-7.17.1-metric-2017.05.30, got 7 instead\n"} {"Time":"2022-04-28T10:46:08.396236-05:00","Action":"output","Package":"github.com/elastic/cloud-on-k8s/test/e2e/apm","Test":"TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch","Output":"checkEventsInElasticsearch failure: 9 documents expected in index apm-7.17.1-metric-2017.05.30, got 8 instead\n"}

If I change things, and retry the counting of the indexes after dropping messages in apmserver, things succeed again.

if err := c.checkEventsAPI(apm); err != nil { fmt.Printf("checkEventsApi failure: %s\n", err) return err } return retry.UntilSuccess(func() error { if err := c.checkEventsInElasticsearch(apm, k); err != nil { fmt.Printf("checkEventsInElasticsearch failure: %s\n", err) return err } return nil }, 5*time.Minute, 10*time.Second)

Errors during this run

Before version upgrade we are missing the index, but the retry after 10 seconds succeeds.

{"Time":"2022-04-28T11:56:34.904163-05:00","Action":"output","Package":"github.com/elastic/cloud-on-k8s/test/e2e/apm","Test":"TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch","Output":"Retries (30m0s timeout): .checkEventsInElasticsearch failure: elasticsearch client failed for https://apmserver-upgrade-hdgx-es-internal-http.e2e-mercury.svc:9200/apm-7.17.1-metric-2017.05.30/_count: 404 Not Found: {Status:404 Error:{CausedBy:{Reason: Type:} Reason:no such index [apm-7.17.1-metric-2017.05.30] Type:index_not_found_exception RootCause:[{Reason:no such index [apm-7.17.1-metric-2017.05.30] Type:index_not_found_exception}]}}\n"}

After the version upgrade, we get some failures, but eventually succeed

1: index initially not found

{"Time":"2022-04-28T12:05:02.499518-05:00","Action":"output","Package":"github.com/elastic/cloud-on-k8s/test/e2e/apm","Test":"TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch#01","Output":"Retries (30m0s timeout): .checkEventsInElasticsearch failure: elasticsearch client failed for https://apmserver-upgrade-hdgx-es-internal-http.e2e-mercury.svc:9200/metrics-apm.app.1234_service_12a3-default/_count: 404 Not Found: {Status:404 Error:{CausedBy:{Reason: Type:} Reason:no such index [metrics-apm.app.1234_service_12a3-default] Type:index_not_found_exception RootCause:[{Reason:no such index [metrics-apm.app.1234_service_12a3-default] Type:index_not_found_exception}]}}\n"}

This repeats from 2022-04-28T 12:05:02.499518-05:00 until 2022-04-28T 12:10:06.010821-05:00 (about 5 mins, which is the time for ES to re-read it's file-based users/roles IIRC) when it finally succeeds.

Attempt 3

Sleep for 10 seconds in between inserting message(s), and checking to see if they made it into ES.

return retry.UntilSuccess(func() error { if err := c.checkEventsAPI(apm); err != nil { fmt.Printf("checkEventsApi failure: %s\n", err) return err } time.Sleep(10*time.Second) if err := c.checkEventsInElasticsearch(apm, k); err != nil { fmt.Printf("checkEventsInElasticsearch failure: %s\n", err) return err } return nil }, 5*time.Minute, 10*time.Second)

Before version upgrade, the first attempt to insert + read messages through apmserver succeeds without failure.

After version upgrade, we get 2x index not found failures, and then it succeeds within < 1 minute (did we just get lucky with timing that it didn't take 5 mins?)

{"Time":"2022-04-28T13:37:28.722225-05:00","Action":"output","Package":"github.com/elastic/cloud-on-k8s/test/e2e/apm","Test":"TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch#01","Output":"Retries (30m0s timeout): .checkEventsInElasticsearch failure: elasticsearch client failed for https://apmserver-upgrade-nkvk-es-internal-http.e2e-mercury.svc:9200/metrics-apm.app.1234_service_12a3-default/_count: 404 Not Found: {Status:404 Error:{CausedBy:{Reason: Type:} Reason:no such index [metrics-apm.app.1234_service_12a3-default] Type:index_not_found_exception RootCause:[{Reason:no such index [metrics-apm.app.1234_service_12a3-default] Type:index_not_found_exception}]}}\n"} {"Time":"2022-04-28T13:37:49.022055-05:00","Action":"output","Package":"github.com/elastic/cloud-on-k8s/test/e2e/apm","Test":"TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch#01","Output":"checkEventsInElasticsearch failure: elasticsearch client failed for https://apmserver-upgrade-nkvk-es-internal-http.e2e-mercury.svc:9200/metrics-apm.app.1234_service_12a3-default/_count: 404 Not Found: {Status:404 Error:{CausedBy:{Reason: Type:} Reason:no such index [metrics-apm.app.1234_service_12a3-default] Type:index_not_found_exception RootCause:[{Reason:no such index [metrics-apm.app.1234_service_12a3-default] Type:index_not_found_exception}]}}\n"} {"Time":"2022-04-28T13:38:09.523725-05:00","Action":"output","Package":"github.com/elastic/cloud-on-k8s/test/e2e/apm","Test":"TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch#01","Output":"\n"}

So from what I see, we have 2 options:

Retry the attempt to find the messages in ES for a bit until we fail/succeed.

Sleep a bit between initial count+insert, and the subsequent count to see if the message(s) arrived successfully.

@pebrc @barkbay Please let me know what you think, and which direction you'd prefer me take this, and I'll get it implemented, and we can finish off this PR.

Thanks!

I also added some debug logs to make it clear what is happening in my case and with your original proposal (516df86):

=== RUN TestAPMServerVersionUpgradeToLatest8x/ApmServer_should_accept_event_and_write_data_to_Elasticsearch#01 Retries (5m0s timeout): 04-29-2022 13:43:54 - [DEBUG] Entering checkEventsAPI(...) 04-29-2022 13:43:56 - [DEBUG] State before injecting events: c.metricIndexCount=0 c.errorIndexCount=0 04-29-2022 13:43:56 - [DEBUG] Injecting event using c.apmClient.IntakeV2Events 04-29-2022 13:43:56 - [DEBUG] Leaving checkEventsAPI without error, c.apmClient.IntakeV2Events seems to be successful 04-29-2022 13:43:56 - [DEBUG] Entering checkEventsInElasticsearch(...) 04-29-2022 13:43:56 - [DEBUG] entering assertCountIndexEqual for index metrics-apm.app.1234_service_12a3-default, expecting 1 events 04-29-2022 13:43:56 - [ERROR] leaving assertCountIndexEqual with error: 404 Not Found: no such index [metrics-apm.app.1234_service_12a3-default] .... 5 minutes later, still the same error ... 04-29-2022 13:48:48 - [DEBUG] entering assertCountIndexEqual for index metrics-apm.app.1234_service_12a3-default, expecting 1 events 04-29-2022 13:48:48 - [ERROR] leaving assertCountIndexEqual with error: 404 Not Found: no such index [metrics-apm.app.1234_service_12a3-default] step.go:43: Error Trace: utils.go:88 Error: Received unexpected error: timeout reached after 5m0s

The index is never created, checkEventsAPI is never retried.
I revisited the original retry policy (based on 516df86) to give a chance to checkEventsAPI(...) to be retried:

--- a/test/e2e/test/apmserver/checks_apm.go +++ b/test/e2e/test/apmserver/checks_apm.go @@ -246,7 +246,9 @@ func (c *apmClusterChecks) checkEventsInElasticsearch(apm apmv1.ApmServer, k *te } return assertCountIndexEqual(c.esClient, errorIndex, c.errorIndexCount+1) - }, 5*time.Minute, 10*time.Second) + // We need some retries here as the events might not have been indexed in Elasticsearch. + // timeout must be less than the parent retry policy, but long enough to let events flow to Elasticsearch. + }, 30*time.Second, 2*time.Second) }

With that change applied it works after 3 calls to checkEventsAPI():

# First attempt: KO with `404`, even after the updated timeout of `30*time.Second` index is not created Retries (5m0s timeout): .04-29-2022 12:57:21 - [DEBUG] Entering checkEventsAPI(...) 04-29-2022 12:57:23 - [DEBUG] State before injecting events: c.metricIndexCount=0 c.errorIndexCount=0 04-29-2022 12:57:23 - [DEBUG] Injecting event using c.apmClient.IntakeV2Events 04-29-2022 12:57:23 - [DEBUG] Leaving checkEventsAPI without error, c.apmClient.IntakeV2Events seems to be successful 04-29-2022 12:57:23 - [DEBUG] Entering checkEventsInElasticsearch(...) 04-29-2022 12:57:23 - [DEBUG] entering assertCountIndexEqual for index metrics-apm.app.1234_service_12a3-default, expecting 1 events 04-29-2022 12:57:23 - [ERROR] leaving assertCountIndexEqual with error: 404 Not Found:no such index [metrics-apm.app.1234_service_12a3-default] 04-29-2022 12:57:25 - [DEBUG] entering assertCountIndexEqual for index metrics-apm.app.1234_service_12a3-default, expecting 1 events [ ... last error x 12 ... ] 04-29-2022 12:57:52 - [DEBUG] entering assertCountIndexEqual for index metrics-apm.app.1234_service_12a3-default, expecting 1 events 04-29-2022 12:57:52 - [ERROR] leaving assertCountIndexEqual with error: 404 Not Found:no such index [metrics-apm.app.1234_service_12a3-default] # Second attempt: same problem, KO with `404` .04-29-2022 12:57:56 - [DEBUG] Entering checkEventsAPI(...) 04-29-2022 12:57:56 - [DEBUG] State before injecting events: c.metricIndexCount=0 c.errorIndexCount=0 04-29-2022 12:57:56 - [DEBUG] Injecting event using c.apmClient.IntakeV2Events 04-29-2022 12:57:56 - [DEBUG] Leaving checkEventsAPI without error, c.apmClient.IntakeV2Events seems to be successful 04-29-2022 12:57:56 - [DEBUG] Entering checkEventsInElasticsearch(...) 04-29-2022 12:57:56 - [DEBUG] entering assertCountIndexEqual for index metrics-apm.app.1234_service_12a3-default, expecting 1 events 04-29-2022 12:57:56 - [ERROR] leaving assertCountIndexEqual with error: 404 Not Found:no such index [metrics-apm.app.1234_service_12a3-default] 04-29-2022 12:57:58 - [DEBUG] entering assertCountIndexEqual for index metrics-apm.app.1234_service_12a3-default, expecting 1 events [ last error x 12] 04-29-2022 12:58:25 - [DEBUG] entering assertCountIndexEqual for index metrics-apm.app.1234_service_12a3-default, expecting 1 events 04-29-2022 12:58:25 - [ERROR] leaving assertCountIndexEqual with error: 404 Not Found:no such index [metrics-apm.app.1234_service_12a3-default] # Third attempt: OK after 1 `404` 04-29-2022 12:58:29 - [DEBUG] Entering checkEventsAPI(...) 04-29-2022 12:58:29 - [DEBUG] State before injecting events: c.metricIndexCount=0 c.errorIndexCount=0 04-29-2022 12:58:29 - [DEBUG] Injecting event using c.apmClient.IntakeV2Events 04-29-2022 12:58:30 - [DEBUG] Leaving checkEventsAPI without error, c.apmClient.IntakeV2Events seems to be successful 04-29-2022 12:58:30 - [DEBUG] Entering checkEventsInElasticsearch(...) 04-29-2022 12:58:30 - [DEBUG] entering assertCountIndexEqual for index metrics-apm.app.1234_service_12a3-default, expecting 1 events 04-29-2022 12:58:30 - [ERROR] leaving assertCountIndexEqual with error: 404 Not Found:no such index [metrics-apm.app.1234_service_12a3-default] 04-29-2022 12:58:32 - [DEBUG] entering assertCountIndexEqual for index metrics-apm.app.1234_service_12a3-default, expecting 1 events 04-29-2022 12:58:32 - [DEBUG] 1 documents expected in index metrics-apm.app.1234_service_12a3-default, got 0 instead 04-29-2022 12:58:34 - [DEBUG] entering assertCountIndexEqual for index metrics-apm.app.1234_service_12a3-default, expecting 1 events 04-29-2022 12:58:34 - [DEBUG] leaving assertCountIndexEqual without error 04-29-2022 12:58:34 - [DEBUG] entering assertCountIndexEqual for index logs-apm.error-default, expecting 1 events 04-29-2022 12:58:34 - [DEBUG] leaving assertCountIndexEqual without error

So from what I see, we have 2 options:

A third option would be to say that despite our efforts it is very hard to state that an APM upgrade is successful, we could also use an annotation in the podTemplate to trigger a restart and validate that observedGeneration is updated as expected.

about 5 mins, which is the time for ES to re-read it's file-based users/roles IIRC

I would be really surprised if ES only reloads these files every 5 minutes (that would not match my experience).

about 5 mins, which is the time for ES to re-read it's file-based users/roles IIRC

I would be really surprised if ES only reloads these files every 5 minutes (that would not match my experience).

@barkbay https://www.elastic.co/guide/en/elasticsearch/reference/current/file-realm.html#file-realm
bottom of page, every 5 seconds. Well, I was quite the order of magnitude off...

I'm going to update this with these changes

}, 30*time.Second, 2*time.Second)

as you're suggesting, and run some additional tests.

Put back tracing error captures.

naemono · 2022-05-04T03:28:13Z

run/e2e-tests

naemono · 2022-05-04T13:37:23Z

run/e2e-tests

barkbay

LGTM, I tested TestAPMServerVersionUpgradeToLatest8x twice, a first time to upgrade to 8.1.1, second time to upgrade to 8.2.0, and it worked as expected 👍

…ation (elastic#5470) * apm observed generation changes. Adjust reconciler to always attempt a status update, even on error. Add generation check to e2e APMServer tests. * Ensure apm status is updated when associations are broken. Update tests for new behavior. * Capitalization issues * Adjust e2e tests for apmserver to support mutations and check observedgeneration increment. Debugging e2e tests * Debugging e2e test failures * remove todo * nolint validation function * remove unused erroring fake k8s client * remove fakeK8sClient from apm testing * Fix spelling * wip * Avoid mutating the original apmserver's configuration during e2e tests * Add step to Apm Server CheckStackTestSteps to ensure ES APM Server user has needed permissions before attempting to ingest events. * Add description to CheckIndexCreation step. * Lint issues * Cleanup of some unneeded logging messages and comments. * remove duplicative recorder, dynamicwatches, and parameters * wip * Remove confusing defer bits from apmserver reconciliation * Upcase Kubernetes Remove unneeded context from test * move associations check inside doReconcile * Wrap erorrs, don't log in e2e test. Use NewElasticsearchClientWithUser to get client, don't create new one manually. * Move CheckIndexCreation closer to where it's used. * Use implemented deepcopy instead of copying the internal map in apm builder use returned results, instead of newly named results * remove unneeded log import/var * Move index variable closer to where it's used. Change how variables are initialized in e2e test. * Add without integration check piece to original builder. * Allow counting of documents in index during upgrade to function properly. * just return the assert * Adding apm tracing back to particular error. Removing request from doReconcile, and NewState, as it's unused. Using t.Helper Adding additional comments to the common Driver Interface. Renaming e2e check function. Fix misspelling. * remove unused result from state * just leveraging eventually for the apm server upgrade etest. Optimizing apm server upgrade test. * just return func * remove retry.UntilSuccess in CheckEventsInElasticsearch e2e test * Move retry.UntilSuccess around the assert calls only. * Add type check to satisfy linter. * Add WithoutIntegrationCheck to smoke tests for apm server builder * Latest released version, not latest snapshot * If no ES ref exists, just skip the step where you test for ES data in index. * Add WithoutIntegrationCheck to smoke tests for apm server builder (again) * Move retry logic to include checkEventsAPI * Expicitly return empty result. Put back tracing error captures. * minimize the total time, and retry time. * Adjust retry to just retry the check in ES.

naemono added 3 commits March 11, 2022 14:08

apm observed generation changes.

e8a3ba6

Adjust reconciler to always attempt a status update, even on error. Add generation check to e2e APMServer tests.

Ensure apm status is updated when associations are broken.

9de0275

Update tests for new behavior.

Capitalization issues

28e6b74

botelastic bot added the triage label Mar 14, 2022

naemono added >enhancement Enhancement of existing functionality >feature Adds or discusses adding a feature to the product labels Mar 14, 2022

botelastic bot removed the triage label Mar 14, 2022

naemono added 7 commits March 14, 2022 09:56

Adjust e2e tests for apmserver to support mutations and check observe…

04b538d

…dgeneration increment. Debugging e2e tests

Debugging e2e test failures

f47d4d5

remove todo

272bd83

nolint validation function

fcb6eac

remove unused erroring fake k8s client

1300560

remove fakeK8sClient from apm testing

199c02b

Fix spelling

ca0b2ec

naemono added 5 commits March 15, 2022 14:01

wip

1cf201e

Avoid mutating the original apmserver's configuration during e2e tests

822498c

Add step to Apm Server CheckStackTestSteps to ensure ES APM Server us…

d7285b1

…er has needed permissions before attempting to ingest events.

Add description to CheckIndexCreation step.

3588bb0

Lint issues

6e830f2

Cleanup of some unneeded logging messages and comments.

dcb5635

naemono marked this pull request as ready for review March 22, 2022 16:19

remove duplicative recorder, dynamicwatches, and parameters

e152518

pebrc reviewed Mar 24, 2022

View reviewed changes

naemono added 4 commits March 24, 2022 09:48

wip

e11c85c

Remove confusing defer bits from apmserver reconciliation

4322f60

Upcase Kubernetes

06551bb

Remove unneeded context from test

Merge branch 'main' into 3392-apm-observedGeneration

fb6dba4

naemono added 2 commits April 19, 2022 08:31

remove retry.UntilSuccess in CheckEventsInElasticsearch e2e test

51c62c7

Move retry.UntilSuccess around the assert calls only.

2e8791e

Add type check to satisfy linter.

1a511d5

thbkrkr added the v2.3.0 label Apr 21, 2022

Add WithoutIntegrationCheck to smoke tests for apm server builder

a0a727c

naemono added 2 commits April 25, 2022 10:17

Merge branch 'main' into 3392-apm-observedGeneration

fc56794

Latest released version, not latest snapshot

3a82b71

naemono added 2 commits April 26, 2022 07:56

If no ES ref exists, just skip the step where you test for ES data in…

4fbb15f

… index.

Add WithoutIntegrationCheck to smoke tests for apm server builder (ag…

516df86

…ain)

naemono mentioned this pull request Apr 27, 2022

Base ECK docker image on distroless instead of UBI by default #5580

Merged

2 tasks

barkbay reviewed Apr 28, 2022

View reviewed changes

naemono added 3 commits April 28, 2022 10:15

Move retry logic to include checkEventsAPI

895a2cf

Expicitly return empty result.

a0e56e4

Put back tracing error captures.

minimize the total time, and retry time.

ea77435

Adjust retry to just retry the check in ES.

7cc64ac

barkbay approved these changes May 5, 2022

View reviewed changes

naemono merged commit 216e4bf into elastic:main May 5, 2022

naemono deleted the 3392-apm-observedGeneration branch May 5, 2022 13:35

barkbay removed the >feature Adds or discusses adding a feature to the product label Jun 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elastic APM Server: Set status.ObservedGeneration from metadata.Generation #5470

Elastic APM Server: Set status.ObservedGeneration from metadata.Generation #5470

naemono commented Mar 14, 2022 •

edited

Loading

naemono commented Mar 14, 2022

naemono commented Mar 15, 2022

naemono commented Mar 22, 2022

naemono commented Apr 20, 2022

naemono commented Apr 20, 2022

naemono commented Apr 24, 2022

naemono commented Apr 25, 2022

naemono commented Apr 26, 2022

naemono commented Apr 27, 2022

naemono commented Apr 27, 2022

barkbay Apr 28, 2022

barkbay Apr 28, 2022

barkbay Apr 28, 2022

naemono Apr 28, 2022

barkbay Apr 29, 2022

barkbay Apr 29, 2022 •

edited

Loading

naemono May 4, 2022

naemono commented May 4, 2022

naemono commented May 4, 2022

barkbay left a comment

Elastic APM Server: Set status.ObservedGeneration from metadata.Generation #5470

Elastic APM Server: Set status.ObservedGeneration from metadata.Generation #5470

Conversation

naemono commented Mar 14, 2022 • edited Loading

naemono commented Mar 14, 2022

naemono commented Mar 15, 2022

naemono commented Mar 22, 2022

naemono commented Apr 20, 2022

naemono commented Apr 20, 2022

naemono commented Apr 24, 2022

naemono commented Apr 25, 2022

naemono commented Apr 26, 2022

naemono commented Apr 27, 2022

naemono commented Apr 27, 2022

barkbay Apr 28, 2022

Choose a reason for hiding this comment

barkbay Apr 28, 2022

Choose a reason for hiding this comment

barkbay Apr 28, 2022

Choose a reason for hiding this comment

naemono Apr 28, 2022

Choose a reason for hiding this comment

barkbay Apr 29, 2022

Choose a reason for hiding this comment

barkbay Apr 29, 2022 • edited Loading

Choose a reason for hiding this comment

naemono May 4, 2022

Choose a reason for hiding this comment

naemono commented May 4, 2022

naemono commented May 4, 2022

barkbay left a comment

Choose a reason for hiding this comment

naemono commented Mar 14, 2022 •

edited

Loading

barkbay Apr 29, 2022 •

edited

Loading