-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audit data corruption on NFS volumes #1351
Comments
Correction - as discussed, we don't need to implement this schema for audit log as we don't have to put audit log entries to the NFS volumes and simply use local storage only with log forwarders. |
So just to clarify how do we centralize the session logs for those of us that may not have a preferred log forwarder yet? The other concern is how are the session logs accessed from the webui there are multiple Auth servers? |
made several edits |
@klizhentas The Web UI uses the audit log to figure out which sessions are complete and can be played back and and which are active and can be joined based off the values of An idea: we store the audit log in the backend and provide a log forwarder that forwards to a file. This allows us to build more log forwarders in the future and maintain existing functionality with the file based events. |
@russjones We can explore bringing back audit logs to backends, or we can add direct integrations with external structured logging facilities for querying as well, e.g. it will be no problem direclty log to ELK/Splunk and simply query the backends, reducing the amount of work. |
Echoing @mechastorm, if using the recommended shared NFS volume causes corruption, how does one implement high availability? Is it possible? |
fixed in 2.5.0, by #1549 |
Problem
When running multiple Teleport Auth Servers in a HA configuration, the recommended approach for the audit log is to mount a shared NFS volume which all Auth Servers write to. This however will not work because multiple clients opening a file with the
O_APPEND
flag will leads to data corruption as outlined in the NFS documentation in section A8 and A9.Proposed Solution
To clarify the design algo a little bit:
The only way to solve the problem with NFS that does not guarantee atomic of concurrent appends is to make sure there is only one writer per opened file.
In case if there are several concurrent auth servers writing in the context of the same session, they will write to different files.
The files format will be exactly the same as the existing format.
For example one auth server 1 will write the following blocks
auth server 2 will write the following blocks:
Then for playback, we simply gather and join all chunks.
We would need to perform a similar scheme for the metadata as well as the audit log itself because they all also reside on a NFS volume and are subject to the same issues.
Audit events
The Web UI uses the audit log to figure out which sessions are complete and can be played back and and which are active and can be joined based off the values of session.start and session.
Direct integrations with external structured logging facilities for querying and logging are going to solve this problem, e.g. using ELK/Splunk API to query the backends will reducing the amount of work.
The text was updated successfully, but these errors were encountered: