Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop retrying by default after a request timeout #100

Merged
merged 2 commits into from
Jun 3, 2024
Merged

Conversation

JoshMock
Copy link
Member

Disables retrying requests on TimeoutError by default, exposing a new option retryOnTimeout for those who still wish to use the old behavior.

See elastic/elasticsearch-js#2087.

Copy link
Contributor

@ezimuel ezimuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, the only part that is missing is the documentation. I think we should document this change in some way.

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me for what we want for Kibana

@JoshMock
Copy link
Member Author

JoshMock commented Jun 3, 2024

@ezimuel The transport doesn't have its own changelog, other than on the releases page, but I'll make sure to document this functionality when I update the client to use this change.

@JoshMock JoshMock merged commit 5567584 into main Jun 3, 2024
12 checks passed
@JoshMock JoshMock deleted the retry-on-timeout branch June 3, 2024 15:21
@ptica
Copy link

ptica commented Jan 13, 2025

hi, after three days of debugging and two production hotfixes I believe that this change
has caused HA failures on our es cluster (12 nodes)

what we were experiencing are bursts of 500-1000 socket timeouts during scheduled k8 maintenance on individual es nodes where the next es node was not automatically used as before;
instead a client exception was newly raised

the newly indroduced and documented option is propagated to the inner request (i see it as part of options var on line 493, kRetryOnTimeout stayed false no matter how I constructed the client)

in the end, we mitigated the issue with custom transport class as mentioned in #110 (comment) but I think the expectations from the client re high availability are not fulfilled with current defaults of the client.

summary:

  • the retryOnTimeout from Client constructor is not reflected in kRetryOnTimeout, thus next es node is not tried
  • i believe this was a bc change regarding ha behaviour of the js client

thank you for time and effort

@JoshMock
Copy link
Member Author

@ptica Thanks for reporting! Can you open a new issue describing the symptoms you experienced, and include the custom transport class you used that resolved the problem for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants