-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgwire: migrate auth/conn logs to notable events #57839
Conversation
06a20f5
to
2777ef4
Compare
27bfbf5
to
17727d2
Compare
friendly ping to the reviewers |
2016b7a
to
92099bb
Compare
92099bb
to
3fd19d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 16 of 16 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @itsbilal, and @knz)
pkg/sql/pgwire/auth.go, line 358 at r1 (raw file):
func (p *authPipe) AuthFail(err error) { p.readerDone <- authRes{err: err}
do we want to log here? seems easy to miss if AuthOK implictly logs but not AuthFail.
pkg/sql/pgwire/hba/hba.go, line 94 at r1 (raw file):
return "host" default: panic(errors.Newf("unimplemented: %v", int(t)))
nit: "unimplemented conn type" (or something to identify better since i've been scared by panics omitting stack traces when "wrapped" by something else)
pkg/util/log/eventpb/session_events.proto, line 57 at r1 (raw file):
// The connection type after transport negotiation. string transport = 1 [(gogoproto.jsontag) = ",omitempty"]; // The username the session is for.
should we specify whether this is normalised or not?
i notice we normalise this above -- is this something we want? (feels like we may lose information if we normalise them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @itsbilal, and @knz)
pkg/sql/pgwire/auth.go, line 114 at r1 (raw file):
tlsState, hbaEntry, methodFn, err := c.findAuthenticationMethod(authOpt) if err != nil { ac.LogAuthFailed(ctx, "authentication method lookup error", err)
This should be a warning and not an ordinary auth failed, right? or is this a user-specified auth method that might not exist?
pkg/sql/pgwire/auth.go, line 126 at r1 (raw file):
if err != nil { ac.LogAuthFailed(ctx, "authentication pre-hook failed", err)
what is this? should it be an warning instead? shouldn't AuthFailed be reserved for cases where users gave wrong credentials, and not just arbitrary errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @itsbilal, @jordanlewis, and @otan)
pkg/sql/pgwire/auth.go, line 114 at r1 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
This should be a warning and not an ordinary auth failed, right? or is this a user-specified auth method that might not exist?
To answer this point and the one below: we want an authn event logged in all authn failure cases, regardless of cause.
As to how this error can occur: This can naturally happen if none of the rules in the HBA configuration happen to match the incoming connection. That's a valid configuration (and pg logs a similar message in the same situation).
pkg/sql/pgwire/auth.go, line 126 at r1 (raw file):
what is this?
This happens when e.g. the method fails to recognize the parameters it needs in the connection, before the authn starts.
should it be an warning instead?
I'm not sure what you mean with “warning”? There's no notion of warning here: authentication either succeeds or fails, there's nothing in-between.
shouldn't AuthFailed be reserved for cases where users gave wrong credentials, and not just arbitrary errors?
Ok now that's an interesting discussion: do we want more granular events types than authentication pass/fail with an "Info" field that describes the failure?
This requirement had not been communicated to me earlier. I was assuming we only needed authn ok/fail.
pkg/sql/pgwire/auth.go, line 358 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
do we want to log here? seems easy to miss if AuthOK implictly logs but not AuthFail.
Yeah that was not a great idea. Moved the logging for the OK case side-by-side with the fail case instead, to improve legibility.
pkg/sql/pgwire/hba/hba.go, line 94 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
nit: "unimplemented conn type" (or something to identify better since i've been scared by panics omitting stack traces when "wrapped" by something else)
Done.
pkg/util/log/eventpb/session_events.proto, line 57 at r1 (raw file):
should we specify whether this is normalised or not?
Yes sure, done.
i notice we normalise this above -- is this something we want? (feels like we may lose information if we normalise them)
the normalization occurs in fact very early during the conn handshake. the .Normalized()
call merely returns the stored normalized username. By this time it's too late to log the original.
(I'll ask Aaron if logging the original usernam eis important)
3fd19d5
to
0bc8c16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 8 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @itsbilal, @jordanlewis, and @otan)
0bc8c16
to
392b793
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @arulajmani, @itsbilal, @jordanlewis, and @otan)
pkg/sql/pgwire/auth.go, line 126 at r1 (raw file):
Previously, knz (kena) wrote…
what is this?
This happens when e.g. the method fails to recognize the parameters it needs in the connection, before the authn starts.should it be an warning instead?
I'm not sure what you mean with “warning”? There's no notion of warning here: authentication either succeeds or fails, there's nothing in-between.shouldn't AuthFailed be reserved for cases where users gave wrong credentials, and not just arbitrary errors?
Ok now that's an interesting discussion: do we want more granular events types than authentication pass/fail with an "Info" field that describes the failure?
This requirement had not been communicated to me earlier. I was assuming we only needed authn ok/fail.
Discussed with @aaron-crl : the Reason field is now populated with an enum with documented + stable values, instead of an unscrupulous string.
pkg/util/log/eventpb/session_events.proto, line 57 at r1 (raw file):
Previously, knz (kena) wrote…
should we specify whether this is normalised or not?
Yes sure, done.
i notice we normalise this above -- is this something we want? (feels like we may lose information if we normalise them)
the normalization occurs in fact very early during the conn handshake. the
.Normalized()
call merely returns the stored normalized username. By this time it's too late to log the original.(I'll ask Aaron if logging the original usernam eis important)
Discussed with the team - this can be added in a separate PR.
886d79b
to
27adee7
Compare
The "crdb-v1" format is brittle and certainly not designed to be unambiguous and reliably parseable. Nevertheless, we can do a little bit more to allow IPv6 addresses in logging tags, to enable parsing the subset of messages that's needed for an authentication log. Release note: None
27adee7
to
1b541e8
Compare
1b541e8
to
bd4bb56
Compare
Prior to this patch, when `FetchEntriesFromFiles` was fetching entries from multiple file groups (previously known as "secondary loggers"), it mistakenly stopped after processing entries from just one logger. This patch fixes it. Note: this API is deprecated and should be replaced by something that knows about file groups. Release note: None
Release note (security update): When using a SQL proxy, in the default configuration CockroachDB only knows about the network address of the proxy. That *peer* address is then used for logging, authentication rules, etc. This is undesirable, as security logging and authentication rules need to operate on the actual (final) client address instead. CockroachDB can now be configured to solve this problem (conf mechanism detailed below). When so configured, a SQL proxy can inform the CockroachDB server of the real address of the client via a server status parameter called `crdb:remote_addr`. The value must be the IP address of the client, followed by a colon, followed by the port number, using the standard Go syntax (e.g. `11.22.33.44:5566` for IPv4, `[11:22::33]:4455` for IPv6). When provided, this value overrides the SQL proxy's address for logging and authentication purposes. In any case, the original peer address is also logged alongside the client address (overridden or not), via the new logging tag `peer`. Security considerations: - enabling this feature allows the peer to spoof its address wrt authentication and thus bypass authentication rules that would otherwise apply to its address, which can introduce a serious security vulnerability if the peer is not trusted. This is why this feature is not enabled by default, and must only be enabled when using a trusted SQL proxy. - this feature should only be used with SQL proxies which actively scrub a `crdb:remote_addr` parameter received by a remote client, and replaces it by its own. If the proxy mistakenly forwards the parameter as provided by the client, it opens the door to the aforementioned security vulnerability. - care must be taken in HBA rules: TLS client cert validation, if requested by a rule, is still performed using the certificate presented by the proxy, not that presented by the client. This means that this new feature is not sufficient to forward TLS client cert authn through a proxy. (If TLS client cert authn is required, it must be performed by the proxy directly.) - care must be taken in HBA rules: the 'protocol' field (first column) continues to apply to the connection type between CockroachDB and the proxy, not between the proxy and the client. Only the 4th column (the CIDR pattern) is matched against the proxy-provided remote address override. Therefore, it is not possible to apply different rules to different client address when proxying TCP connections via a unix socket, because HBA rules for unix connections don't use the address column. Also when proxying client SSL connections via a non-SSL proxy connection, or proxying client non-SSL connections via a SSL proxy connection, care must be taken to configure address-based rule matching using the proper connection type. A reliable way to bypass this complexity is to only use the `host` connection type which applies equally to SSL and non-SSL connections. As of this implementation, the feature is enabled using the non-documented environment variable `COCKROACH_TRUST_CLIENT_PROVIDED_SQL_REMOTE_ADDR`. The use of an env var is a stop-gap so that this feature can be used in CC SQL pods which do not have access to cluster settings. The env var will be eventually removed and replaced by another mechanism.
Release note (backward-incompatible change): The connection and authentication logging enabled by the cluster settings `server.auth_log.sql_connections.enabled` and `server.auth_log.sql_sessions.enabled` was previously using a text format which was hard to parse and integrate with external monitoring tools. This has been changed to use the standard notable event mechanism, with standardized payloads. The output format is now structured; see its reference documentation for details about the supported event types and payloads.
bd4bb56
to
1b70424
Compare
TFYRs! bors r=otan,jordanlewis |
Build failed (retrying...): |
Build succeeded: |
Fixes #56154
Release note (backward-incompatible change): The connection and
authentication logging enabled by the cluster settings
server.auth_log.sql_connections.enabled
andserver.auth_log.sql_sessions.enabled
was previously using a textformat which was hard to parse and integrate with external monitoring
tools. This has been changed to use the standard notable event
mechanism, with standardized payloads. The output format is now
structured; see its reference documentation for details about
the supported event types and payloads.