Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Show warning when download upgrade is failing #173844

Merged
merged 7 commits into from
Jan 2, 2024

Conversation

criamico
Copy link
Contributor

@criamico criamico commented Dec 21, 2023

Fixes #173370
Closes #171941

Summary

Show a warning when download upgrade is failing.

This PR addresses a specific case of upgrade downloading: when the upgrade started but it's failing with an error. In this case, since 8.12, the agent metadata have a retry_msg that can be used to distinguish this case from the regular upgrade.
I'm also fixing one smaller bug that I introduced with #173253, the tooltip shown in the case when the agent is not upgradeable was hiding the badge, so I'm moving the if as the last one in the function.

Testing

  • Have an 8.12 agent installed with Multipass (it needs to be upgradeable)
  • Change the download binary url to something broken: sourceURI: https://artifacts.elastic.co/notdownloads/
  • Force upgrade from dev tools:
POST kbn:/api/fleet/agents/c3f09103-4e69-4a36-bee2-84223bedef36/upgrade
{
  "version": "8.12.0",
  "force": true
}
  • Go to agent overview, the badge will show "upgrading" but will also have a warning icon with a tooltip, showing the retry message and the retry_until time in humanized format (retry until...remaining):
    Screenshot 2024-01-02 at 16 54 20

This way the message shows the values present in the agent metadata.

  • Same is reported in the agent list table:
    Screenshot 2024-01-02 at 16 54 28

  • After a while the retries will be finished and the agent will show the regular "upgrade failed" badge (this was already implemented):
    Screenshot 2023-12-21 at 12 07 27

Checklist

@criamico criamico added the Team:Fleet Team label for Observability Data Collection Fleet team label Dec 21, 2023
@criamico criamico self-assigned this Dec 21, 2023
@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@criamico criamico added release_note:skip Skip the PR/issue when compiling release notes release_note:enhancement v8.13.0 and removed release_note:skip Skip the PR/issue when compiling release notes labels Dec 21, 2023
@criamico
Copy link
Contributor Author

/ci

@criamico
Copy link
Contributor Author

@elasticmachine merge upstream

@criamico criamico marked this pull request as ready for review December 21, 2023 15:55
@criamico criamico requested a review from a team as a code owner December 21, 2023 15:55
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@criamico
Copy link
Contributor Author

The PR is almost ready, the only thing left to address is this comment : after all the retries fail, the agent goes to "UPG_FAILED" state but it doesn't seem to reset even after long time. Coupled with a "healthy" state this looks weird and I think that we should reset it after some time.

I don't know if we want to use this PR to address it or merge it and do it separately.

defaultMessage="Upgrade failing: {retryMsg}. {retryUntil}"
values={{
retryMsg: agentUpgradeDetails?.metadata?.retry_error_msg,
retryUntil: formatRetryUntil(agentUpgradeDetails?.metadata?.retry_until),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the retry_until is a deadline in UTC, it might be easier for the user to read to if we also calculated and presented the time remaining as retry_until - now(). This would let us present something like "Retrying until: 2024-01-30T15:03:38.159Z (1h53m remaining)" to save the user from having to calculate this in their own head.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the remaining time in humanized format as discussed:
Screenshot 2024-01-02 at 16 54 28

I'm not displaying it at all in those cases when the retry time is set in the past and the difference is negative. It can happen when the upgrade failed and value doesn't change anymore.

@criamico
Copy link
Contributor Author

criamico commented Jan 2, 2024

@elasticmachine merge upstream

Copy link
Contributor

@juliaElastic juliaElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
fleet 1.2MB 1.2MB +1.0KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
fleet 157.6KB 157.7KB +109.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @criamico

@kpollich
Copy link
Member

kpollich commented Jan 2, 2024

Hey @criamico it looks like the errors reported often delimit individual errors with *, which I assume is intended to show as bullets in the UI. Would it be feasible to run this through react-markdown and put the resulting HTML into the tooltip's content prop? That way we get an actual <ul> node instead of the * characters? I don't know how easy that would be so I'm curious if that would work, or if the agent team even recommends it based on what else can appear in these error strings.

One piece of concern is that there's no newline after the * blocks, so I don't know if this will even work.

@criamico
Copy link
Contributor Author

criamico commented Jan 2, 2024

Would it be feasible to run this through react-markdown and put the resulting HTML into the tooltip's content prop?

@kpollich I'm not sure about it, I can take a quick look tomorrow but if it's too involved I would rather file an enhancement ticket for it.

@criamico criamico merged commit 1de5cde into elastic:main Jan 2, 2024
20 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:enhancement Team:Fleet Team label for Observability Data Collection Fleet team v8.13.0
Projects
None yet
8 participants