Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet][EPM] Don't roll back on saved objects conflict errors. #85131

Merged

Conversation

skh
Copy link
Contributor

@skh skh commented Dec 7, 2020

Summary

This refines the implementation of #84190 and implements #84651 . See also #84656 for a bit of discussion.

This changes the behavior of _installPackage() so that

  • when a concurrent installation is detected, a ConcurrentInstallOperationError is thrown (instead of returning a list of installed assets which may or may not be complete)
  • when a version conflict on a saved object write operation is thrown in any of the install*() methods called by _installPackage(), this is also wrapped in a ConcurrentInstallOperationError
  • all other errors are thrown as before
  • higher up in the call chain, ConcurrentInstallOperationError will not trigger a rollback. This fixes the bug that occurs when a second installation/upgrade operation aborts because of a saved object version conflict, and therefore rolls back the installation that a first installation operation just completed successfully, potentially resulting in follow-up errors and a broken installation
  • ConcurrentInstallOperationError will cause the handler to return a 409 HTTP response with a message stating on which package the concurrent installation was detected, and that the operation was aborted.

This is still a rather optimistic way of handling this situation: when a concurrent installation is detected, the running installation is aborted and no attempts are made to clean up after it. This is possible because the install*() methods (installing kibana assets, pipelines, templates etc.) are idempotent. Indeed it is still perfectly possible that two parallel installations run successfully, installing everything twice, or that they only run into the saved object conflict at the very end, after almost everything was installed twice.

This may have effects on other users of the install package code flow, namely endpoint security. (cc @jonathan-buttner )

How to test this

  • Try to get the installed package into a broken state. To do that, try to trigger a race condition by installing the same package several times at once, and observe if the race condition is handled correctly. https://gist.github.com/skh/cc695952031c9e349874b898c7066e42 may be helpful for this -- I had to set WAIT_TIME_REINSTALL in that script to 0 and run it a few times.

  • Try to break it in any other way.

@skh skh self-assigned this Dec 7, 2020
@skh skh added Feature:EPM Fleet team's Elastic Package Manager (aka Integrations) project release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v7.11.0 v8.0.0 labels Dec 7, 2020
@skh skh marked this pull request as ready for review December 7, 2020 13:20
@skh skh requested a review from a team December 7, 2020 13:20
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Feature:EPM)

@skh skh changed the title Don't rollback on saved objects conflict errors. [Fleet][EPM] Don't rollback on saved objects conflict errors. Dec 7, 2020
@skh skh changed the title [Fleet][EPM] Don't rollback on saved objects conflict errors. [Fleet][EPM] Don't roll back on saved objects conflict errors. Dec 7, 2020
@skh
Copy link
Contributor Author

skh commented Dec 7, 2020

@elasticmachine merge upstream

@neptunian neptunian self-requested a review December 7, 2020 18:58
@skh skh force-pushed the 84651-check-for-saved-object-version-conflict branch from 846781c to 69d125c Compare December 14, 2020 14:00
@skh skh merged commit 1b3a1bb into elastic:master Dec 14, 2020
@skh skh deleted the 84651-check-for-saved-object-version-conflict branch December 14, 2020 21:29
@kibanamachine
Copy link
Contributor

kibanamachine commented Jan 18, 2021

⏳ Build in-progress, with failures

Failed CI Steps

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:EPM Fleet team's Elastic Package Manager (aka Integrations) project release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants