Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H12 Timeouts on Heroku with Sente #56

Closed
sritchie opened this issue Jul 17, 2014 · 13 comments
Closed

H12 Timeouts on Heroku with Sente #56

sritchie opened this issue Jul 17, 2014 · 13 comments

Comments

@sritchie
Copy link
Collaborator

Hey @ptaoussanis,

I recently upgraded to Sente 0.14.1 (from 0.12.0), and, suspiciously, started seeing a bunch of H12 - Application Timeouts on my Heroku app.

I've also noticed a number of Idle Connection warnings as well.

Could something have changed between those versions wrt the default keep-alive settings? Also, is there some way to tune these settings? I believe that Heroku has a 30 second timeout on connections, and keep-alives need to be sent within 55 seconds. I'm not sure if these errors were coming from websocket connections or long-polling connections.

Anyway, this is out of my wheelhouse, but it's certainly something that just started after a recent batch of upgrades. Would love your advice! Let me know if I can get you any further info.

@sritchie sritchie added the bug label Jul 17, 2014
@sritchie
Copy link
Collaborator Author

@sritchie
Copy link
Collaborator Author

Upgraded from 0.13.0, actually, and upgraded http-kit from "2.1.13" to "2.1.18".

@ptaoussanis
Copy link
Member

Hey Sam,

About to head to sleep so will need to look at this properly tomorrow. In the meantime: no timeout changes between v0.13.0 and v0.14.1 that I can remember off-hand but you could try adjust the default timeouts (down) and see if that makes a difference? The client-side make-channel-socket! fn can take opts:

:ws-kalive-ms ; WebSocket keep-alive interval (defaults to 38000)
:lp-kalive-ms ; Ajax (long-polling) keep-alive interval (defaults to 38000)

Cheers!

EDIT Just to clarify: the WebSocket interval will send a small PING iff no other activity has taken place in the window (it's cheap); the Ajax interval will close + re-establish a new long-polling connection (can be a significant cost, but not bad with http-kit).

@sritchie
Copy link
Collaborator Author

okay, nice. Looks like it's actually lp-timeout. I'm going to bring them both down to 40 seconds; that should kill the timeout issue.

Do you know if http-kit has some limit on concurrent connections that's getting saturated by these websocket and long-polling connections? I'm worried that by enabling this feature without properly tuning http-kit I'm hosing my application.

That may be a Heroku thing too. I wonder if Heroku, or NGinx, simply cuts me off when enough concurrent users are on the site.

Anyway, this is a good start! Would love any advice you have when you're up in the AM :)

Thanks @ptaoussanis!

@ptaoussanis
Copy link
Member

I'm going to bring them both down to 40 seconds; that should kill the timeout issue.

Oh, to clarify: the keep-alives are in milliseconds so the defaults are both 38 seconds. I'm guessing you'd need to bring them down rather than up? Something like 25000 may be worth trying.

Do you know if http-kit has some limit on concurrent connections

It can take a ton of concurrent connections when configured for it; not sure about the defaults - will check tomorrow. It'll start throwing exceptions if it's over-burdened with the config you're running.

That may be a Heroku thing too. I wonder if Heroku, or NGinx, simply cuts me off when enough concurrent users are on the site.

Hmm - no idea on Heroku, sorry. Nginx won't be a problem (again, with an appropriate config). Roughly how many concurrent users are you looking at?

Here's an older version of http-kit doing 600k concurrent connections on some decent hardware: http://http-kit.org/600k-concurrent-connection-http-kit.html

@sritchie
Copy link
Collaborator Author

Definitely time for me to get some more monitoring in place :) I'm going to dig into this tomorrow. We're moving off of heroku soon, so I shouldn't have to debug that too hard.

@ptaoussanis
Copy link
Member

Hey Sam, sorry for the delay getting back to you - should have some time to look at this today.

Any update? Did tweaking the keep-alive interval(s) down solve the problem?

ptaoussanis added a commit that referenced this issue Jul 20, 2014
As per Heroku http-routing docs[1]:
"After a dyno connection has been established, HTTP requests have an initial 30 second window in which the web process must return response data (either the completed response or some amount of response data to indicate that the process is active). Processes that do not send response data within the initial 30-second window will see an H12 error in their logs."

So to play better with Heroku timeouts, have decreaeed the default
Sente keep-alive intervals as follows:
`:ws-kalive-ms` - 38000 -> 25000
`:lp-kalive-ms` - 38000 -> 25000

https://devcenter.heroku.com/articles/http-routing#timeouts
@ptaoussanis
Copy link
Member

Okay, have confirmed that Heroku requires a sub-30s timeout:

"After a dyno connection has been established, HTTP requests have an initial 30 second window in which the web process must return response data (either the completed response or some amount of response data to indicate that the process is active). Processes that do not send response data within the initial 30-second window will see an H12 error in their logs."

(From https://devcenter.heroku.com/articles/http-routing#timeouts).

Have a v0.15.1 hotfix ready to go if you can confirm that adjusting your keep-alives solves the issue.

Note that I'm not sure why you only saw this problem when upgrading from v0.13.0 to v0.14.1. The keep-alive values weren't changed, and there's nothing else that changed that'd obviously affect this. Is it possible something change with your Heroku config at the same time you upgraded Sente releases?

@sritchie
Copy link
Collaborator Author

Yeah, I can look into that. What's confusing here is that once a connection is established, heroku only needs to see data every 55 seconds to keep a connection alive- so I thought the defaults would have handled it. Maybe something about a socket connection with NO data after that first handshake? I'll definitely try this fix today.—
Sent from Mailbox

On Sat, Jul 19, 2014 at 11:19 PM, Peter Taoussanis
notifications@github.com wrote:

Okay, have confirmed that Heroku requires a sub-30s timeout:
"After a dyno connection has been established, HTTP requests have an initial 30 second window in which the web process must return response data (either the completed response or some amount of response data to indicate that the process is active). Processes that do not send response data within the initial 30-second window will see an H12 error in their logs."
(From https://devcenter.heroku.com/articles/http-routing#timeouts).
Have a v0.15.1 hotfix ready to go if you can confirm that adjusting your keep-alives solves the issue.

Note that I'm not sure why you only saw this problem when upgrading from v0.13.0 to v0.14.1. The keep-alive values weren't changed, and there's nothing else that changed that'd obviously affect this. Is it possible something change with your Heroku config at the same time you upgraded Sente releases?

Reply to this email directly or view it on GitHub:
#56 (comment)

@ptaoussanis
Copy link
Member

Ahh, I think you've identified the point of confusion.

Sente will not send any data over an Ajax connection until an actual payload is ready. Then it'll start a new connection. This means that the 30 second window (not the 55 second window) should apply (as I understand Heroku's docs).

I believe the 55 second limit will apply to things liked chunked/streaming transfers, but isn't applicable to Sente's long polling.

Does that make sense?

@sritchie
Copy link
Collaborator Author

Yup, that makes total sense. I'm pushing up an update to our staging server now; I'll let you know how it looks!

@sritchie
Copy link
Collaborator Author

It looks like this worked! Thanks for all your help, @ptaoussanis. Killer library, killer service. You're an open source assassin.

@ptaoussanis
Copy link
Member

You're an open source assassin.

Hah hah, thank you - and going to remember that term ;-)

BTW added a brief comment on the choice to go for long-polling over chunked encoding: https://github.com/ptaoussanis/sente/blob/0532e028ebe3e5d9f829160fe720b4464a2d36a1/src/taoensso/sente.cljx#L51

Have a great day, cheers! :-)

@ptaoussanis ptaoussanis removed the bug label Jul 28, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants