Immediately close bad connections to prevent file exhaustion #6112

mjjbell · 2021-08-29T20:40:58Z

osrm-routed does not immediately clean up a keep-alive connection when the client closes it. Instead it waits for five seconds of inactivity before removing.

Given a setup with low file limits and clients opening and closing a lot of keep-alive connections, it's possible for
osrm-routed to run out of file descriptors whilst it waits for the clean-up to trigger.

Furthermore, this causes the connection acceptor loop to exit.
Even after the old connections are cleaned up, new ones will not be created. Any new requests will block until the
server is restarted.

You can replicate this by limiting the number of open files (e.g. on Linux ulimit -n 100) and running a script that makes a curl request to a server endpoint in a loop.

while true
do
  curl "http://localhost:5000/match/v1/car/2.320208,48.702049;2.320521,48.702363;2.320843,48.702727;2.320874,48.702761;2.320874,48.702761;2.319644,48.701588" &>/dev/null
done

This commit improves the situation by:

Immediately closing connections on error. This includes EOF errors
indicating that the client has closed the connection. This releases
resources early (including the open file) and doesn't wait for the
timer.
Log when the acceptor loop exits. Whilst this means the behaviour
can still occur for reasons other than too many open files,
we will at least have visibility of the cause and can investigate further.

Tasklist

CHANGELOG.md entry (How to write a changelog entry)
update relevant Wiki pages
add tests (see testing documentation
review
adjust for comments
cherry pick to release branch

Requirements / Relations

Fixes #6040

TheMarex

Makes sense. 👍

osrm-routed does not immediately clean up a keep-alive connection when the client closes it. Instead it waits for five seconds of inactivity before removing. Given a setup with low file limits and clients opening and closing a lot of keep-alive connections, it's possible for osrm-routed to run out of file descriptors whilst it waits for the clean-up to trigger. Furthermore, this causes the connection acceptor loop to exit. Even after the old connections are cleaned up, new ones will not be created. Any new requests will block until the server is restarted. This commit improves the situation by: - Immediately closing connections on error. This includes EOF errors indicating that the client has closed the connection. This releases resources early (including the open file) and doesn't wait for the timer. - Log when the acceptor loop exits. Whilst this means the behaviour can still occur for reasons other than too many open files, we will at least have visibility of the cause and can investigate further.

mjjbell force-pushed the mbell/close_keepalive branch from be06e4f to 63ce4af Compare August 29, 2021 21:05

mjjbell mentioned this pull request Aug 29, 2021

The router.project-osrm.org is returning a 502 bad gateway once in a while #6102

Closed

TheMarex approved these changes Sep 3, 2021

View reviewed changes

mjjbell force-pushed the mbell/close_keepalive branch from 63ce4af to fb9308a Compare September 3, 2021 22:59

mjjbell force-pushed the mbell/close_keepalive branch from fb9308a to fa4a54e Compare September 4, 2021 00:14

mjjbell merged commit f1a6056 into Project-OSRM:master Sep 4, 2021

mjjbell deleted the mbell/close_keepalive branch September 4, 2021 00:55

mjjbell mentioned this pull request Feb 17, 2022

Keepalive #5518

Merged

mjjbell mentioned this pull request Oct 25, 2022

Implement NodeJS based server replicating osrm-routed #6411

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Immediately close bad connections to prevent file exhaustion #6112

Immediately close bad connections to prevent file exhaustion #6112

mjjbell commented Aug 29, 2021 •

edited

Loading

TheMarex left a comment

Immediately close bad connections to prevent file exhaustion #6112

Immediately close bad connections to prevent file exhaustion #6112

Conversation

mjjbell commented Aug 29, 2021 • edited Loading

Tasklist

Requirements / Relations

TheMarex left a comment

Choose a reason for hiding this comment

mjjbell commented Aug 29, 2021 •

edited

Loading