Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid reporting socket errors via Sentry observer #1026

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

chriskuehl
Copy link
Member

@chriskuehl chriskuehl commented Nov 19, 2024

After merging #1009, we started getting a small number of socket errors reported via Sentry:

OSError: [Errno 9] Bad file descriptor
  File "src/gevent/greenlet.py", line 908, in gevent._gevent_cgreenlet.Greenlet.run
  File "gevent/pywsgi.py", line 719, in read_requestline
    line = self.rfile.readline(MAX_REQUEST_LINE)
  File "socket.py", line 720, in readinto
    return self._sock.recv_into(b)
  File "gevent/_socketcommon.py", line 691, in recv_into
    return self._sock.recv_into(*args)

I traced this down to two potential causes, neither of which are user-impacting:

Cause 1: The Sentry observer reports unhandled exceptions from greenlets, even though we handle the exceptions

The Sentry observer hooks into gevent's print_exception function in order to report all unhandled exceptions to Sentry:

class _SentryUnhandledErrorReporter:
"""Hook into the Gevent hub and report errors outside request context."""
@classmethod
def install(cls) -> None:
from gevent import get_hub
gevent_hub = get_hub()
gevent_hub.print_exception = cls(gevent_hub)

This means that unhandled errors will always get reported via Sentry, even though the read_requestline function is intended to raise socket errors which are handled in the calling function:
https://github.com/gevent/gevent/blob/a50ca62df1bb84c2960ba34c655f18129d845b19/src/gevent/pywsgi.py#L769-L776

(socket.error is an alias for OSError, so this catches all OSError exceptions and uses them as a signal to close the connection.)

To fix this, I wrapped the function so that the greenlet returns a tuple (result, exception) rather than raising the exception. This no longer gets reported to Sentry.

Cause 2: There's a possible race condition where the socket can be closed before the read_requestline() greenlet runs

Imagine the shutdown event becomes set. The "main" greenlet for the connection does the below:

if self._shutdown_event in ready:
real_read_requestline.kill()
# None triggers the base class to close the connection.
return None

Returning None causes the caller to close the connection which is normally not a problem, but depending on the order the greenlet execution happens and when the kill signal is delivered, it could close the connection while the read_requestline greenlet is still running, which causes it to try to read from a closed socket.

I was able to reproduce this via adding gevent.sleep()s to force the race condition, but I suspect this is not the cause of the production errors. To fix this, I added a join() on the read_requestline greenlet to ensure it finishes executing before we return.

Validation

I was able to reproduce both of the above cases locally using the test I added, but it's difficult to consistently reproduce the issues in a test since it's hard to force race conditions without adding gevent.sleep() in random places.

@chriskuehl chriskuehl marked this pull request as ready for review November 19, 2024 16:45
@chriskuehl chriskuehl requested a review from a team as a code owner November 19, 2024 16:45
@chriskuehl chriskuehl merged commit ebbf2ee into develop Nov 20, 2024
4 checks passed
@chriskuehl chriskuehl deleted the fix-socket-errors-reported-via-sentry branch November 20, 2024 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants