Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[7.10][Ingest Manager][EPM] Moving reinstall function outside of promise.all to avoid race #82664

Merged

Conversation

jonathan-buttner
Copy link
Contributor

This is to address this issue: #82614

There is a potential for a race condition during the /setup API call if one of the required packages is in the installing state. A required package can get in this state if a failure occurred to install/upgrade that package and we were unable to rollback the package for some reason.

Upon the next call to /setup, if the package can be upgraded, the upgrade code will be initiated in addition to attempting to reinstall the failed package version because both the

ensureInstalledDefaultPackages and the ensurePackagesCompletedInstall calls happen in parallel in the Promise.all.

This PR tries to mitigate that by moving the reinstall functionality outside of the Promise.all so that it is done after the upgrade attempt.

I'm not sure how we would test this with integration tests 🤔 or if it's worth writing unit tests for this. Let me know what you think.

@jonathan-buttner jonathan-buttner added release_note:fix v8.0.0 v7.10.0 Team:Fleet Team label for Observability Data Collection Fleet team v7.11.0 labels Nov 4, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@neptunian
Copy link
Contributor

I was able to consistently reproduce the saved object conflict locally by having only one required package that /setup checks for, force installing an older version of it and throwing an error during the installation so the package became stuck in "installing" status, and then running setup again. It tries to upgrade and reinstall causing the conflict. With this change, I cannot reproduce it following the same steps.

Copy link
Contributor

@jfsiii jfsiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving since a) worst case "should" be longer install time b) @neptunian can repro error & fix.

Can we open a ticket to create a test based on @neptunian instructions? e.g. mock ensure* to throw and/or put saved object in a certain state?

We could move the function back into Promise.all, create the failure case which causes the failure, then move it back to this position and change the expectation.

@jonathan-buttner
Copy link
Contributor Author

Issue for adding tests: #82670

@jonathan-buttner jonathan-buttner merged commit a5ef1d7 into elastic:7.10 Nov 4, 2020
@jonathan-buttner jonathan-buttner deleted the fix-ingest-reinstall-race branch November 4, 2020 23:22
@kevinlog kevinlog changed the title [Ingest Manager][EPM] Moving reinstall function outside of promise.all to avoid race [7.10][Ingest Manager][EPM] Moving reinstall function outside of promise.all to avoid race Nov 4, 2020
@jonathan-buttner jonathan-buttner added backport:skip This commit does not require backporting and removed v7.11.0 v8.0.0 labels Nov 4, 2020
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting blocker release_note:fix Team:Fleet Team label for Observability Data Collection Fleet team v7.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants