Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Fleet server migrations SavedObjectsClient#delete race-condition #89252

Closed
kobelb opened this issue Jan 26, 2021 · 6 comments
Closed
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Fleet Fleet team's agent central management project Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@kobelb
Copy link
Contributor

kobelb commented Jan 26, 2021

The Fleet server migrations call SavedObjectsClient#delete after creating the .fleet-enrollment-api-keys document:

await soClient.delete(ENROLLMENT_API_KEYS_SAVED_OBJECT_TYPE, key.id);

When multiple Kibana instances are running the migrations in parallel, there's a race-condition and the second call to SavedObjectsClient#delete will cause an error to the thrown. At the moment, this error being thrown will crash Kibana until #89251 is resolved.

@kobelb kobelb added bug Fixes for quality problems that affect the customer experience Team:Fleet Team label for Observability Data Collection Fleet team labels Jan 26, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@kobelb kobelb added the Feature:Fleet Fleet team's agent central management project label Jan 26, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Feature:Fleet)

@ruflin
Copy link
Contributor

ruflin commented Jan 26, 2021

Is there a way to write tests for this to catch these things automatically?

@kobelb
Copy link
Contributor Author

kobelb commented Jan 26, 2021

Is there a way to write tests for this to catch these things automatically?

The only option that we have right now is to use unit-tests to mock the error responses. #79743 would have potentially caught these issues in CI, and #79748 would help developers catch them.

@nchaulet
Copy link
Member

Good catch there is another race condition here I missed when we get the enrollment key to decrypt it, we should probably just ignore 404 here.

@nchaulet
Copy link
Member

Resolved by #89372

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Fleet Fleet team's agent central management project Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

4 participants