Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set a default write timeout #1314

Merged
merged 1 commit into from
Oct 27, 2023
Merged

Conversation

alex-hunt-materialize
Copy link
Contributor

Sets a default write timeout of 295 seconds to match the default read timeout.

Motivation

We recently had an application hang during some kube operations with no obvious cause other than a lack of timeouts. When investigating, we realize that there were timeouts for connect and read by default, but write seems to have been omitted. We do not have strong evidence that this lack of timeout was the cause of our hang, but at this moment it seems like our strongest lead.

While we can set the timeout ourselves after creating the Config, it would be better to prevent this foot gun for others by setting it by default when creating the Config.

Solution

Choosing 295 seconds is a bit arbitrary, and should arguably be much smaller, but having a timeout at all seems obviously better than just hanging forever.

The possibility of setting the timeout was introduced in #971, but that PR did not include the write timeout simply because the Go client didn't have one. To me, that another client neglected a timeout on a network operation does not seem like a sufficient reason to do the same.

Sets a default write timeout of 295 seconds to match the default read
timeout.

Signed-off-by: Alex Hunt <alex.hunt@materialize.com>
@codecov
Copy link

codecov bot commented Oct 19, 2023

Codecov Report

Merging #1314 (9d49cd1) into main (c3fbe25) will decrease coverage by 0.0%.
The diff coverage is 33.4%.

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1314     +/-   ##
=======================================
- Coverage   72.4%   72.3%   -0.0%     
=======================================
  Files         75      75             
  Lines       6343    6344      +1     
=======================================
- Hits        4587    4586      -1     
- Misses      1756    1758      +2     
Files Coverage Δ
kube-client/src/config/mod.rs 45.4% <33.4%> (-0.5%) ⬇️

... and 1 file with indirect coverage changes

Copy link
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey. Thanks for this. I think setting a default here makes sense.

Initially, thought that 295s was a weird number, and was going to suggest somewhere in the range 60-100s originally, but a quick search over github/kubernetes reveals general ballpark numbers in the same order. Probably makes sense to start high as here.

@clux clux added the changelog-fix changelog fix category for prs label Oct 19, 2023
@clux clux added this to the 0.87.0 milestone Oct 19, 2023
@clux
Copy link
Member

clux commented Oct 19, 2023

recently had an application hang during some kube operations with no obvious cause

perhaps of interest, someone did report one weird lockup on the discord - have not been able to get to the bottom of it yet

@alex-hunt-materialize
Copy link
Contributor Author

Thanks for the quick review and the link. I'll have to take a look again later, the discord isn't loading for me.

@clux
Copy link
Member

clux commented Oct 21, 2023

There is an issue the lockup in #1316. If it is similar to what you are encountering then it would be good to know.

@alex-hunt-materialize
Copy link
Contributor Author

Thank you for the link. It is hard to tell if what we saw is that same issue. It certainly could be.

We do not restart our controllers within a single process, but do recreate our controller pods when their code changes (up to several times a day).

We do run several controllers in a single process. On startup there can be hundreds of reconciliations.

@clux
Copy link
Member

clux commented Oct 27, 2023

Thanks again. Merging this.

There might be a better timeout number long term, but don't have a good justification for a smaller number (feel free to give one in the future).

@clux clux merged commit 4083979 into kube-rs:main Oct 27, 2023
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog-fix changelog fix category for prs
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants