TcpTransport `writeTimeout` appears to be ignored #48

benfleis · 2015-01-16T16:38:17Z

Both from my own testing, and from looking through the code, it appears the the writeTimeout AtomicBoolean is ignored. It looks like it should be ferried through to bootstrap, and then possibly have some exception handling added to make it do the right thing. (I dunno, I've never used netty myself.) In any case, I see in deployment cases where it takes multiple minutes for the socket to recognize that it's dead (seeing TCP PUSH over and over); if the default of 5s was effective, this slow stream failure detection would be avoided by the high level timeout.

The text was updated successfully, but these errors were encountered:

aphyr · 2015-01-16T17:56:48Z

Soooo, this is kind of an interesting question. Right now, you can pick arbitrary timeouts for a write simply by choosing the timescale for deref(), but it may also make sense to force a timeout for all clients. That imposes an upper bound on deref timeouts, so I want to make sure the default behavior is sane and documented. How would you expect deref to interact with a transport timeout setting? And what do you think sane defaults are? 10 seconds?

benfleis · 2015-01-16T18:27:24Z

I am inclined to agree, practically speaking. In my particular case, I'm using a RiemannReporter -- in a perfect world, I just give it the host/port of the riemann service, and trust that it will "live well", and survive this situation. The DNS cache disabling patch I sent earlier made it at least workable, even if with a long delay after the riemann service restart.

The pedant in me asks: is this variable being honored in any way? If not, maybe at least kill the fake variable.

Sane defaults -- 60s or less. 10s is not aggressive, per se, but it's definitely not passive :) Does anybody ever want a timeout >60s when deploying the sort of system that uses riemann?

aphyr · 2015-01-16T18:29:20Z

Yeah, I tend to instrument on 1 or 5-second interval, so 10s is already stale. ;-) I'll have to dig into the source and see whether WriteTimeout does anything right now. For a while I was aiming to have partial backpressure on async sends but wound up backing off that; couldn't get stable behavior under load. My guess is it's unused but we should hook it up to the Netty channel options.

benfleis · 2015-01-19T15:11:59Z

I can create a trivial diff to pass the param through, but anything more is beyond my current java/netty + riemann-client scope, at least to do with any competence. Let me know what's useful.

aphyr · 2015-01-20T19:30:35Z

Yeah, could you shoot me a PR for this? I'm, like, incredibly backlogged right now and haven't had good internet, but that'll make sure I get around to reviewing/merging. :)

eric · 2016-05-03T00:26:42Z

With the current state of the library, what is the correct behavior when a deref times out? Should I be calling .close() on the client and then .connect() again?

aphyr · 2016-05-03T02:34:20Z

Reconnects have been automatic for 2 years or so, IIRC.
On May 2, 2016 7:26 PM, Eric Lindvall notifications@github.com wrote:With the current state of the library, what is the correct behavior when a deref times out? Should I be calling .close() on the client and then .connect() again?

—You are receiving this because you commented.Reply to this email directly or view it on GitHub

eric · 2016-05-03T02:55:24Z

It does look like there is reconnection logic, but I couldn't see where a reconnection would happen if a deref timed out. Without a write timeout, it seems that you can get into a pathological situation (which I've seen many times) where the process doesn't see the connection is closed, but the write buffer is full, so nothing else can be written to the socket.

I guess I'll continue to close the connection on timeout.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TcpTransport `writeTimeout` appears to be ignored #48

TcpTransport `writeTimeout` appears to be ignored #48

benfleis commented Jan 16, 2015

aphyr commented Jan 16, 2015

benfleis commented Jan 16, 2015

aphyr commented Jan 16, 2015

benfleis commented Jan 19, 2015

aphyr commented Jan 20, 2015

eric commented May 3, 2016

aphyr commented May 3, 2016

eric commented May 3, 2016

TcpTransport writeTimeout appears to be ignored #48

TcpTransport writeTimeout appears to be ignored #48

Comments

benfleis commented Jan 16, 2015

aphyr commented Jan 16, 2015

benfleis commented Jan 16, 2015

aphyr commented Jan 16, 2015

benfleis commented Jan 19, 2015

aphyr commented Jan 20, 2015

eric commented May 3, 2016

aphyr commented May 3, 2016

eric commented May 3, 2016

TcpTransport `writeTimeout` appears to be ignored #48

TcpTransport `writeTimeout` appears to be ignored #48