Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request timeout on Heroku after migrating to 2.8.4 and migration to MongoDB Atlas #6778

Closed
UnlikelySassou opened this issue Jul 7, 2020 · 2 comments
Labels
type:question Support or code-level question

Comments

@UnlikelySassou
Copy link

Issue Description

I recently had to migrate to Atlas (from mLab) due to the fact that they got acquired.
I've upgraded Parse Server from 2.3.6 to 2.8.4 to support Atlas requirements for MongoDB compatibility and proceed to migrate my database. All seemed to be working fine.

However, I now have H12 errors on Heroku that completely crash my app. The app works fine for a few days with good response time, then out of nowhere, most requests will fail with code=H12 "Request timeout". I have to manually restart the app to make things work again, I don't know how to debug this or how I can have more logs about what's happening.

I've migrated about 3 weeks ago, but these errors and pattern appear just now every 2 days or so. I did no change at all on the code both client/server side. The only changes are in my package file to parse-server 2.8.4 and node 12.16.3.

I had no issues before migrating and the issue does not seem to be database side because I have no errors showing and the database response time and other metrics are good on Atlas.

Any help with this issue will be appreciated!

Environment Setup

  • Server

    • parse-server version : 2.8.4
    • Hardware: Heroku's Standard-1x (512 MB)
    • Localhost or remote server? : MongoDB Atlas
  • Database

    • MongoDB version: 4.2.8
    • Hardware: Atlas M10
    • Localhost or remote server? : MongoDB Atlas

Logs/Trace

2020-07-07T19:01:51.669796+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=POST path="/parse/classes/Message" host=xxx.herokuapp.com request_id=e80e9c1d-58c4-46bc-9439-b22f16cbc443 fwd="xxx" dyno=web.2 connect=0ms service=30000ms status=503 bytes=0 protocol=https
2020-07-07T19:01:51.751322+00:00 heroku[router]: at=info method=POST path="/parse/classes/Server" host=xxx.herokuapp.com request_id=6f98ac6e-deb4-40ae-a0f3-f82a2238f0fc fwd="xxx" dyno=web.1 connect=7ms service=126ms status=200 bytes=10958 protocol=https
2020-07-07T19:01:51.796090+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=POST path="/parse/classes/Message" host=xxx.herokuapp.com request_id=a7a9138e-cecf-4fd0-a737-cbc247dacd8d fwd="xxx" dyno=web.2 connect=0ms service=30063ms status=503 bytes=0 protocol=https
2020-07-07T19:01:52.533429+00:00 heroku[router]: at=info method=POST path="/parse/login" host=xxx.herokuapp.com request_id=4960d0d5-f5b0-48ef-9da5-2049662e29a7 fwd="xxx" dyno=web.1 connect=7ms service=699ms status=200 bytes=952 protocol=https
2020-07-07T19:01:52.582702+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=POST path="/parse/classes/Message" host=xxx.herokuapp.com request_id=073d6f92-78e7-4042-90da-27148c1f32eb fwd="xxx" dyno=web.2 connect=1ms service=30000ms status=503 bytes=0 protocol=https
2020-07-07T19:01:52.677413+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=POST path="/parse/classes/Server" host=xxx.herokuapp.com request_id=8455f966-2582-4cb6-a9fe-fd77ccbda240 fwd="xxx" dyno=web.2 connect=1ms service=30000ms status=503 bytes=0 protocol=https

metrics

@mtrezza
Copy link
Member

mtrezza commented Jul 7, 2020

What you are describing sounds like a typical timeout configuration issue, so let's assume that for now.

It is characteristic that the app runs fine for some time and then suddenly spikes in errors, usually under increasing load.

Imagine your architecture simplified like this:

client - router (Heroku) - dyno (Heroku) - http server (Node.js + Parse Server) - DB (MongoDB)

Internally, the Heroku router has a timeout of 30s. After that, it assumes the Dyno won't respond and it closes the upstream socket to the Dyno. But the Dyno doesn't know that and keeps processing the request. The router assumes the Dyno is free and forwards the next incoming request to the Dyno, and the requests keep piling up until the Dyno is overwhelmed and responds more and more with 5xx errors. If you scale up the Dyno you would typically see less H12 errors, but that doesn't solve the underlying issue and you increase the cost.

You cannot configure anything beyond the abstraction that Heroku provides you, so the 30s Dyno timeout is something you have to work with. It makes sense to ask Heroku support for the keep-alive timeouts of their router. Then check your Node.js server app for settings regarding:

  • request timeouts
  • keep-alive timeouts

A Parse Server mounted on a Node.js http server has timeout settings that have to be configured according to the whole upstream / downstream network chain.

const httpServer = require('http').createServer({...});
httpServer.timeout = 25 * 1000;
httpServer.keepAliveTimeout = 70 * 1000;
httpServer.headersTimeout = 120 * 1000;

I have added some sample values above that may work for you if we assume Heroku's router to have a keep-alive timeout of 60s, which is the de-facto default across the industry. However, generally speaking you want:

  • timeout to be shorter than the Heroku Dyno timout of 30s.
  • keepAliveTimeout to be longer than the Heroku router keep-alive timeout (load balancer, reverse proxy, or whatever is next downstream) so that every element in the network chain can be sure that a kept-alive connection is still open going upstream.
  • headersTimeout to be longer than the keepAliveTimeout of your http server, see here why.

You can use an agent like NewRelic to look into what is happening with your Node.js app specifically leading up to the crash.

That you didn't experience the H12 errors before migration could either be because your traffic coincidentally increased, or the performance of your choosen MongoDB Atlas cluster is just different than the mLab cluster you had previously, leading to longer response times and making it more likely for the server to time out. Take a look at the Parse Server configuration databaseOptions.maxTimeMS which should be less than the timeout parameter mentioned above.

@UnlikelySassou
Copy link
Author

@mtrezza I was indeed missing the timeout configuration on the http server. I've added the settings and redeployed to Heroku. I will monitor the metrics for the next 3 day and close this issue if nothing comes up.

Thank you for your detailed answer, it was very well explained

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:question Support or code-level question
Projects
None yet
Development

No branches or pull requests

2 participants