Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Johnny/dp timeout #1720

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Johnny/dp timeout #1720

wants to merge 2 commits into from

Conversation

psFried
Copy link
Member

@psFried psFried commented Oct 18, 2024

Wraps connector and gazette client dials in timeouts, since the connect_timeout grpc options are insufficient to cover all cases that might block indefinitely. This is intended to fix issues with agents that stop making progress.


This change is Reviewable

@psFried psFried requested a review from jshearer October 18, 2024 17:48
Copy link
Contributor

@jshearer jshearer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Interesting, this smells very similar to the issues I was seeing with Dekaf where after some period of time gRPC requests would just hang, and my temporary work-around was to just dial a whole new TCP connection for each session to get around it.

let result = tokio::time::timeout(Duration::from_secs(10), async {
match ep.uri().scheme_str() {
Some("unix") => ep
.connect_with_connector(tower::util::service_fn(|uri: tonic::transport::Uri| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, I didn't even know this was supposed to be timing out at all -- I added a timeout in Dekaf for this as well.

Applies an application-level timeout to activations, to prevent
indefinite blocking in the agent.
Copy link
Member

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@psFried psFried added the change:emergency This change is an emergency label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:emergency This change is an emergency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants