Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immediately close bad connections to prevent file exhaustion #6112

Merged
merged 1 commit into from
Sep 4, 2021

Conversation

mjjbell
Copy link
Member

@mjjbell mjjbell commented Aug 29, 2021

osrm-routed does not immediately clean up a keep-alive connection when the client closes it. Instead it waits for five seconds of inactivity before removing.

Given a setup with low file limits and clients opening and closing a lot of keep-alive connections, it's possible for
osrm-routed to run out of file descriptors whilst it waits for the clean-up to trigger.

Furthermore, this causes the connection acceptor loop to exit.
Even after the old connections are cleaned up, new ones will not be created. Any new requests will block until the
server is restarted.

You can replicate this by limiting the number of open files (e.g. on Linux ulimit -n 100) and running a script that makes a curl request to a server endpoint in a loop.

while true
do
  curl "http://localhost:5000/match/v1/car/2.320208,48.702049;2.320521,48.702363;2.320843,48.702727;2.320874,48.702761;2.320874,48.702761;2.319644,48.701588" &>/dev/null
done

This commit improves the situation by:

  • Immediately closing connections on error. This includes EOF errors
    indicating that the client has closed the connection. This releases
    resources early (including the open file) and doesn't wait for the
    timer.

  • Log when the acceptor loop exits. Whilst this means the behaviour
    can still occur for reasons other than too many open files,
    we will at least have visibility of the cause and can investigate further.

Tasklist

Requirements / Relations

Fixes #6040

Copy link
Member

@TheMarex TheMarex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. 👍

osrm-routed does not immediately clean up a keep-alive connection
when the client closes it. Instead it waits for five seconds
of inactivity before removing.

Given a setup with low file limits and clients opening and
closing a lot of keep-alive connections, it's possible for
osrm-routed to run out of file descriptors whilst it waits for
the clean-up to trigger.

Furthermore, this causes the connection acceptor loop to exit.
Even after the old connections are cleaned up, new ones
will not be created. Any new requests will block until the
server is restarted.

This commit improves the situation by:

- Immediately closing connections on error. This includes EOF errors
indicating that the client has closed the connection. This releases
resources early (including the open file) and doesn't wait for the
timer.

- Log when the acceptor loop exits. Whilst this means the behaviour
can still occur for reasons other than too many open files,
we will at least have visibility of the cause and can investigate further.
@mjjbell mjjbell merged commit f1a6056 into Project-OSRM:master Sep 4, 2021
@mjjbell mjjbell deleted the mbell/close_keepalive branch September 4, 2021 00:55
@mjjbell mjjbell mentioned this pull request Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

osrm-routed connection accept loop can exit and not recover
2 participants