[Fleet][EPM] Don't roll back on saved objects conflict errors. #85131
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This refines the implementation of #84190 and implements #84651 . See also #84656 for a bit of discussion.
This changes the behavior of
_installPackage()
so thatConcurrentInstallOperationError
is thrown (instead of returning a list of installed assets which may or may not be complete)install*()
methods called by_installPackage()
, this is also wrapped in aConcurrentInstallOperationError
ConcurrentInstallOperationError
will not trigger a rollback. This fixes the bug that occurs when a second installation/upgrade operation aborts because of a saved object version conflict, and therefore rolls back the installation that a first installation operation just completed successfully, potentially resulting in follow-up errors and a broken installationConcurrentInstallOperationError
will cause the handler to return a409
HTTP response with a message stating on which package the concurrent installation was detected, and that the operation was aborted.This is still a rather optimistic way of handling this situation: when a concurrent installation is detected, the running installation is aborted and no attempts are made to clean up after it. This is possible because the
install*()
methods (installing kibana assets, pipelines, templates etc.) are idempotent. Indeed it is still perfectly possible that two parallel installations run successfully, installing everything twice, or that they only run into the saved object conflict at the very end, after almost everything was installed twice.This may have effects on other users of the install package code flow, namely endpoint security. (cc @jonathan-buttner )
How to test this
Try to get the installed package into a broken state. To do that, try to trigger a race condition by installing the same package several times at once, and observe if the race condition is handled correctly. https://gist.github.com/skh/cc695952031c9e349874b898c7066e42 may be helpful for this -- I had to set
WAIT_TIME_REINSTALL
in that script to0
and run it a few times.Try to break it in any other way.