Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Remove duplicative retries from client-side requests to APIs that depend on EPR #190722

Conversation

kpollich
Copy link
Member

@kpollich kpollich commented Aug 19, 2024

Summary

Relates to #136617

For APIs that depend on Fleet connecting to Elastic Package Registry, Fleet already retries the connections to EPR on the server side. This results in a situation where, when EPR is unreachable, the requests is retried several times on the server side, and then the request is retried again on the client-side by react-query. This results in very long running API requests.

Since the server-side retries generally cover any kind of flakiness here, disabling the retry logic on the front-end seems sensible. I've also reduced the number of retries on the server side from 5 to 3 to help fail faster here. I walked back the retry change after some test failures, and I don't think it makes a big enough impact to justify changing.

To test

Set xpack.fleet.registryUrl: 127.0.0.1:8080 with nothing running

Before

The requests spin for a very long time.

Screen.Recording.2024-08-19.at.1.02.16.PM.mov

After

The requests stop spinning after a few seconds as the retries won't loop.

Screen.Recording.2024-08-19.at.12.54.38.PM.mov

cc @shahzad31

@kpollich kpollich added release_note:fix Team:Fleet Team label for Observability Data Collection Fleet team backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) labels Aug 19, 2024
@kpollich kpollich self-assigned this Aug 19, 2024
@kpollich kpollich requested a review from a team as a code owner August 19, 2024 17:08
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@obltmachine
Copy link

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@kpollich
Copy link
Member Author

I moved to using EuiSkeletonRectangle for the grid's loading state. The whole process end-to-end to go from loading/retrying -> displaying the "no package registry available" message takes about 30s which I think is a reasonable amount of time accounting for long timeouts, network conditions, etc.

Screen.Recording.2024-08-20.at.8.34.13.AM.mov

Copy link
Member

@nchaulet nchaulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM 🚀

@kpollich
Copy link
Member Author

@elasticmachine merge upstream

@kpollich kpollich enabled auto-merge (squash) August 20, 2024 13:24
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
fleet 1.8MB 1.8MB +317.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
fleet 169.6KB 169.8KB +171.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @kpollich

@kpollich kpollich merged commit cf3149e into elastic:main Aug 20, 2024
22 checks passed
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Aug 20, 2024
…that depend on EPR (elastic#190722)

## Summary

Relates to elastic#136617

For APIs that depend on Fleet connecting to Elastic Package Registry,
Fleet already retries the connections to EPR on the server side. This
results in a situation where, when EPR is unreachable, the requests is
retried several times on the server side, and then the request is
retried again on the client-side by react-query. This results in very
long running API requests.

Since the server-side retries generally cover any kind of flakiness
here, disabling the retry logic on the front-end seems sensible. ~I've
also reduced the number of retries on the server side from 5 to 3 to
help fail faster here.~ I walked back the retry change after some test
failures, and I don't think it makes a big enough impact to justify
changing.

## To test

Set `xpack.fleet.registryUrl: 127.0.0.1:8080` with nothing running

## Before

The requests spin for a very long time.

https://github.com/user-attachments/assets/e4fd77ee-b36c-4965-9f71-e5b3e195f75e

## After

The requests stop spinning after a few seconds as the retries won't
loop.

https://github.com/user-attachments/assets/82adc595-1bc4-4269-8501-2eb83525ad15

cc @shahzad31

(cherry picked from commit cf3149e)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.15

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Aug 20, 2024
…o APIs that depend on EPR (#190722) (#190816)

# Backport

This will backport the following commits from `main` to `8.15`:
- [[Fleet] Remove duplicative retries from client-side requests to APIs
that depend on EPR
(#190722)](#190722)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Kyle
Pollich","email":"kyle.pollich@elastic.co"},"sourceCommit":{"committedDate":"2024-08-20T14:50:00Z","message":"[Fleet]
Remove duplicative retries from client-side requests to APIs that depend
on EPR (#190722)\n\n## Summary\r\n\r\nRelates to
https://github.com/elastic/kibana/issues/136617\r\n\r\nFor APIs that
depend on Fleet connecting to Elastic Package Registry,\r\nFleet already
retries the connections to EPR on the server side. This\r\nresults in a
situation where, when EPR is unreachable, the requests is\r\nretried
several times on the server side, and then the request is\r\nretried
again on the client-side by react-query. This results in very\r\nlong
running API requests.\r\n\r\nSince the server-side retries generally
cover any kind of flakiness\r\nhere, disabling the retry logic on the
front-end seems sensible. ~I've\r\nalso reduced the number of retries on
the server side from 5 to 3 to\r\nhelp fail faster here.~ I walked back
the retry change after some test\r\nfailures, and I don't think it makes
a big enough impact to justify\r\nchanging.\r\n\r\n## To test\r\n\r\nSet
`xpack.fleet.registryUrl: 127.0.0.1:8080` with nothing running\r\n\r\n##
Before\r\n\r\nThe requests spin for a very long
time.\r\n\r\n\r\nhttps://github.com/user-attachments/assets/e4fd77ee-b36c-4965-9f71-e5b3e195f75e\r\n\r\n##
After\r\n\r\nThe requests stop spinning after a few seconds as the
retries
won't\r\nloop.\r\n\r\n\r\nhttps://github.com/user-attachments/assets/82adc595-1bc4-4269-8501-2eb83525ad15\r\n\r\ncc
@shahzad31","sha":"cf3149e983c5aec547e08cfa9202b68cd7115899","branchLabelMapping":{"^v8.16.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","Team:Fleet","backport:prev-minor","v8.16.0"],"title":"[Fleet]
Remove duplicative retries from client-side requests to APIs that depend
on
EPR","number":190722,"url":"https://github.com/elastic/kibana/pull/190722","mergeCommit":{"message":"[Fleet]
Remove duplicative retries from client-side requests to APIs that depend
on EPR (#190722)\n\n## Summary\r\n\r\nRelates to
https://github.com/elastic/kibana/issues/136617\r\n\r\nFor APIs that
depend on Fleet connecting to Elastic Package Registry,\r\nFleet already
retries the connections to EPR on the server side. This\r\nresults in a
situation where, when EPR is unreachable, the requests is\r\nretried
several times on the server side, and then the request is\r\nretried
again on the client-side by react-query. This results in very\r\nlong
running API requests.\r\n\r\nSince the server-side retries generally
cover any kind of flakiness\r\nhere, disabling the retry logic on the
front-end seems sensible. ~I've\r\nalso reduced the number of retries on
the server side from 5 to 3 to\r\nhelp fail faster here.~ I walked back
the retry change after some test\r\nfailures, and I don't think it makes
a big enough impact to justify\r\nchanging.\r\n\r\n## To test\r\n\r\nSet
`xpack.fleet.registryUrl: 127.0.0.1:8080` with nothing running\r\n\r\n##
Before\r\n\r\nThe requests spin for a very long
time.\r\n\r\n\r\nhttps://github.com/user-attachments/assets/e4fd77ee-b36c-4965-9f71-e5b3e195f75e\r\n\r\n##
After\r\n\r\nThe requests stop spinning after a few seconds as the
retries
won't\r\nloop.\r\n\r\n\r\nhttps://github.com/user-attachments/assets/82adc595-1bc4-4269-8501-2eb83525ad15\r\n\r\ncc
@shahzad31","sha":"cf3149e983c5aec547e08cfa9202b68cd7115899"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/190722","number":190722,"mergeCommit":{"message":"[Fleet]
Remove duplicative retries from client-side requests to APIs that depend
on EPR (#190722)\n\n## Summary\r\n\r\nRelates to
https://github.com/elastic/kibana/issues/136617\r\n\r\nFor APIs that
depend on Fleet connecting to Elastic Package Registry,\r\nFleet already
retries the connections to EPR on the server side. This\r\nresults in a
situation where, when EPR is unreachable, the requests is\r\nretried
several times on the server side, and then the request is\r\nretried
again on the client-side by react-query. This results in very\r\nlong
running API requests.\r\n\r\nSince the server-side retries generally
cover any kind of flakiness\r\nhere, disabling the retry logic on the
front-end seems sensible. ~I've\r\nalso reduced the number of retries on
the server side from 5 to 3 to\r\nhelp fail faster here.~ I walked back
the retry change after some test\r\nfailures, and I don't think it makes
a big enough impact to justify\r\nchanging.\r\n\r\n## To test\r\n\r\nSet
`xpack.fleet.registryUrl: 127.0.0.1:8080` with nothing running\r\n\r\n##
Before\r\n\r\nThe requests spin for a very long
time.\r\n\r\n\r\nhttps://github.com/user-attachments/assets/e4fd77ee-b36c-4965-9f71-e5b3e195f75e\r\n\r\n##
After\r\n\r\nThe requests stop spinning after a few seconds as the
retries
won't\r\nloop.\r\n\r\n\r\nhttps://github.com/user-attachments/assets/82adc595-1bc4-4269-8501-2eb83525ad15\r\n\r\ncc
@shahzad31","sha":"cf3149e983c5aec547e08cfa9202b68cd7115899"}}]}]
BACKPORT-->

Co-authored-by: Kyle Pollich <kyle.pollich@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) release_note:fix Team:Fleet Team label for Observability Data Collection Fleet team v8.15.1 v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants