-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fan out events in async mode for async recordings. #4696
Conversation
de28d77
to
0082311
Compare
@webvictim can you try this commit on your hanging server please? Please also set the environment variables to show connection state:
|
0082311
to
c555bcc
Compare
@awly @a-palchikov @fspmarshall @russjones hey folks, please take another look. I have added some defensive logic to make sure SSH connections never hang due to audit at the expense of loosing a portion of the events. I've been struggling to find a balance between security and audits, but decided to get on the side use not being waken up with S1s over hanging sessions. To compensate this, I propose #4755 in 5.1 |
@webvictim can you give this branch (fresh bulid) another try? It should never hang your server |
@klizhentas I have just deployed your second update - I'll let you know. The previous branch that you asked me to try before seemed stable for a while, but then when I tried to log in again today after a couple of days idle it turned out it was hung again. Here's the
|
@webvictim these goroutine dumps you provide are super helpful - thanks for collecting them for me! |
This commit fixes #4695. Teleport in async recording mode sends all events to disk, and uploads them to the server later. It uploads some events synchronously to the audit log so they show up in the global event log right away. However if the auth server is slow, the fanout blocks the session. This commit makes the fanout of some events to be fast, but nonblocking and never fail so sessions will not hang unless the disk writes hang. It adds a backoff period and timeout after which some events will be lost, but session will continue without locking.
c555bcc
to
a49eec9
Compare
This commit fixes #4695.
Teleport in async recording mode sends all events to disk,
and uploads them to the server later.
It uploads some events synchronously to the audit log so
they show up in the global event log right away.
However if the auth server is slow, the fanout blocks the session.
This commit makes the fanout of some events to be fast,
but nonblocking and never fail so sessions will not hang
unless the disk writes hang.
It also adds ability to debug GRPC connection state
when running in debug mode.
To start sending GRPC connection state logs,
set environment variables:
GRPC_GO_LOG_SEVERITY_LEVEL=info GRPC_GO_LOG_VERBOSITY_LEVEL=99 teleport start -d