Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server, tracing: Make traces redactable and default to "off" #73405

Conversation

dhartunian
Copy link
Collaborator

@dhartunian dhartunian commented Dec 2, 2021

Multi-tenant deployments require tracing support
in order to help customers and engineers debug query performance
issues. This requires that trace information recorded on the KV layer
be redacted prior to being transmitted to the tenant in order to
guard against data leakage through the multi-tenant abstractions.

This change makes tracing redactability configurable and adds a cluster
setting to enable fine-grained redactability that we can use for
multi-tenant deployments.

The desired outcome is to preserve redactability for tenant traces while
preserving a non-redactable code path for requests using the system
tenant. This preserves existing functionality in production while
keeping the current performance hit of redaction from propagating to
non-tenant deployments.

This PR brings master in parity with changes in release-21.2 via #73117.

Benchmarking this against master results in the following improvements.
old.txt is using master @ 65920cf
new.txt is using this commit @ 7c768ac
Benchmark was run using gceworker.

We gain back a significant portion of the performance hit:

david@davidh:~/go/src/github.com/cockroachdb/cockroach$ benchstat old.txt new.txt
name                                              old time/op    new time/op    delta
Tracing/1node/scan/trace=100%/threshold=1ms-24      1.20ms ± 4%    1.15ms ± 2%   -4.83%  (p=0.000 n=10+9)
Tracing/1node/insert/trace=100%/threshold=1ms-24    1.51ms ± 3%    1.40ms ± 2%   -6.71%  (p=0.000 n=10+10)
Tracing/3node/scan/trace=100%/threshold=1ms-24      2.17ms ± 3%    2.10ms ± 4%   -3.26%  (p=0.003 n=10+10)
Tracing/3node/insert/trace=100%/threshold=1ms-24    2.41ms ± 3%    2.30ms ± 3%   -4.45%  (p=0.000 n=8+9)

name                                              old alloc/op   new alloc/op   delta
Tracing/1node/scan/trace=100%/threshold=1ms-24       115kB ± 1%     120kB ± 1%   +4.39%  (p=0.000 n=10+10)
Tracing/1node/insert/trace=100%/threshold=1ms-24     149kB ± 1%     152kB ± 0%   +1.92%  (p=0.000 n=10+9)
Tracing/3node/scan/trace=100%/threshold=1ms-24       276kB ± 3%     282kB ± 3%   +2.18%  (p=0.035 n=10+10)
Tracing/3node/insert/trace=100%/threshold=1ms-24     265kB ± 1%     264kB ± 2%     ~     (p=0.963 n=8+9)

name                                              old allocs/op  new allocs/op  delta
Tracing/1node/scan/trace=100%/threshold=1ms-24       1.68k ± 0%     1.47k ± 0%  -12.16%  (p=0.000 n=9+9)
Tracing/1node/insert/trace=100%/threshold=1ms-24     2.05k ± 0%     1.76k ± 0%  -14.46%  (p=0.000 n=10+8)
Tracing/3node/scan/trace=100%/threshold=1ms-24       3.10k ± 0%     2.85k ± 0%   -8.21%  (p=0.000 n=10+10)
Tracing/3node/insert/trace=100%/threshold=1ms-24     3.00k ± 0%     2.66k ± 0%  -11.30%  (p=0.000 n=9+10)

Resolves #71694

Release note: None

Resolves #71694

Release note: None

@dhartunian dhartunian requested a review from a team December 2, 2021 22:32
@dhartunian dhartunian requested a review from a team as a code owner December 2, 2021 22:32
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dhartunian dhartunian force-pushed the make-traces-redactable-and-default-to-off--master branch from 7a5fe7f to 51a8c72 Compare December 8, 2021 04:11
@dhartunian dhartunian requested review from andreimatei and removed request for a team December 8, 2021 04:11
@dhartunian
Copy link
Collaborator Author

@andreimatei this addresses most of your concerns from #73117 and is a much smaller PR. I will update the one on release-21.2 tomorrow and respond to the open threads.

You were correct that the use of redact.Sprintf vs fmt.Fprintf in spanInner.Recordf did not significantly affect the runtime. I ran a narrower benchmark (BenchmarkEventf_WithVerboseTraceSpan) and found that the largest performance gain by far is in using the LegacyString method on the logEntry, which is now renamed to FastString.

@dhartunian
Copy link
Collaborator Author

@andreimatei also updated PR msg w/ benchmarks

@dhartunian dhartunian force-pushed the make-traces-redactable-and-default-to-off--master branch from 51a8c72 to ddcb9b0 Compare December 8, 2021 16:15
Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but will defer to Andrei for any stamps. I did have a quick question though re: redactability for system tenants

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @dhartunian)


pkg/server/node.go, line 1059 at r13 (raw file):

tr.SetRedactable(true)

Do we want redactability to be turned off when the request is made for a system tenant? Does this not configure the "node-global" tracer to redactable = true without any way for it to be turned back off for system tenants?

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 22 files at r7, 1 of 1 files at r9.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @andreimatei, and @dhartunian)


-- commits, line 71 at r10:
something here


pkg/ccl/kvccl/kvtenantccl/tenant_trace_test.go, line 40 at r10 (raw file):

}

func testTenantTracesAreRedactedImpl(t *testing.T, redactable bool) {

please comment on the redactable arg, explaining that when it's not set, you don't get any messages in the trace


pkg/ccl/kvccl/kvtenantccl/tenant_trace_test.go, line 49 at r10 (raw file):

	)

	getTrace := func(t *testing.T, db *gosql.DB) [][]string {

I think this test would look a lot better if it used the WithStatementTrace testing knob instead of session tracing. Feel free to address in a different PR if you prefer.


pkg/ccl/kvccl/kvtenantccl/tenant_trace_test.go, line 75 at r10 (raw file):

	defer tc.Stopper().Stop(ctx)

	t.Run("system-tenant", func(t *testing.T) {

please add a comment stating that queries from the system tenant get unredacted traces, and perhaps hint at the mechanism


pkg/ccl/kvccl/kvtenantccl/tenant_trace_test.go, line 100 at r10 (raw file):

		results := getTrace(t, tenDB)

		const redactedMarkerString = "verbose trace message redacted"

please elevate this to a constant somewhere shared with the production code. Take the opportunity to rename it away from sRedactedMarker. I don't know who started with these type prefixes.


pkg/ccl/kvccl/kvtenantccl/tenant_trace_test.go, line 100 at r10 (raw file):

		results := getTrace(t, tenDB)

		const redactedMarkerString = "verbose trace message redacted"

it's not clear to me when this "message redacted" string is expected exactly. redactRecordingForTenant only seems to use it when processing DeprecatedFields, which is supposed to go away. And even then, only when sp.RedactableLogs is set.
I'm not sure how the test is passing; how come DeprecatedFields arebeing used to render the trace, instead of the Message field?


pkg/util/log/clog_test.go, line 780 at r8 (raw file):

	b.ResetTimer()
	defer sp.Finish()
	sp.SetVerbose(true)

nit: move this above ResetTimer


pkg/util/log/log_entry.go, line 133 at r10 (raw file):

}

// FastString is an old implementation of String(). It is still around

well then why don't we just make this be logEntry.String() if it's faster? If you do that, put a comment on it than it doesn't follow the usual redact.StringWithoutMarkers(e) pattern for String() implementations of SafeFromatters because performance.

But also see my comments on eventInternal. I think if you want to do all I describe there, this method does not need to exist at all (although it might be a good idea to make logEntry.String() fast regardless of the current PR).


pkg/util/log/log_entry.go, line 147 at r10 (raw file):

		buf.WriteString(entry.File)
		buf.WriteByte(':')
		// TODO(knz): The "canonical" way to represent a file/line prefix

if this method stays different from logEntry.String, then considerations about SQL don't apply, do they? And I don't understand what this comment is referring to anyway. But on the other hand, I do not want a trailing colon anyway. I'd get rid of this comment.


pkg/util/log/trace.go, line 104 at r10 (raw file):

may or may not be redactable

consider linking to logEntry.redactable.


pkg/util/log/trace.go, line 105 at r10 (raw file):

// eventInternal is the common code for logging an event to trace and/or event
// logs. Entries passed to this method may or may not be redactable. However,
// if sp.Redactable() is true then ideally they are for best results.

these "best results" need more words, I think. Say that a non-redactable logEntry will be completely redacted if sp.Redactable()


pkg/util/log/trace.go, line 106 at r10 (raw file):

// logs. Entries passed to this method may or may not be redactable. However,
// if sp.Redactable() is true then ideally they are for best results.
func eventInternal(sp *tracing.Span, el *ctxEventLog, isErr bool, entry *logEntry) {

Should spanInner.Recordf() use redact.Sprintf() or fmt.Sprintf() based on sp.Redactable()? I think that's a good idea by itself, but in addition, if you do that and also rename logEntry.FastString() as String(), this code wouldn't need this branch. This code in the logging package would then not need to care about sp.Redactable()` twice - once when preparing the entry, once here.

(this second part feel free to leave for later if you don't want to bite)
in fact, I think the code in the logging package should not care about sp.Redactable() at all. I don't think sp.Redactable() should be exported (or, in fact, exist at all as a method). I think we can get there by having the log.Eventf() code and its friends pass the format and the args to sp.Recordf(), not a logEntry.
This code smells:

		entry := makeUnstructuredEntry(ctx,
			severity.INFO, /* unused for trace events */
			channel.DEV,   /* unused for trace events */
			depth+1,
			sp.Redactable(),
			format, args...)
		eventInternal(sp, el, isErr, &entry)

Those "unused for trace events" comments hint that something's wrong with the layering. I think eventInternal should take the format, args, and depth, do whatever for the eventLog case (I don't care), and pass them all to Recordf() (perhaps it can terminate the depth and prepending the file:line to the format).


pkg/util/log/trace.go, line 115 at r10 (raw file):

		} else {
			// FastString is significantly faster than String since the latter
			// round-trips through redaction. See BenchmarkEventf_WithVerboseTraceSpan.

this comment belongs around FastString


pkg/util/tracing/tracer.go, line 175 at r10 (raw file):

	// as redactable. This will make unstructured events logged to those
	// spans redactable.
	// Currently, this is used to mark tenant-bound spans as redactable

this comment suggests that "tenant-bound spans" are somehow different from other spans, but that's not true, is it?


pkg/util/tracing/tracer.go, line 314 at r10 (raw file):

}

// Redactable returns true if the tracer is configured to emit

it also affects spans that have already been opened, hasn't it? If anything, this should talk about future collection of recordings.


pkg/util/tracing/tracingpb/recorded_span.proto, line 75 at r10 (raw file):

  // RedactableLogs determines whether the verbose log messages are redactable.
  // This field was introduced in the 22.1 cycle. It can be removed in the 22.2

please remove the word "cycle" from here; it's very confusing.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @andreimatei, and @dhartunian)


pkg/util/log/trace.go, line 106 at r10 (raw file):

(perhaps it can terminate the depth and prepending the file:line to the format)

If we do all this, and also push the handling of the depth inside sp.Recordf, then the logging code could also stop caring about sp.IsVerbose() in getSpanOrEventLog(), which would be another win in my book. The logging code could blindly delegate all the work to the span, and a non-verbose span could short-circuit cheaply.

@dhartunian dhartunian force-pushed the make-traces-redactable-and-default-to-off--master branch from ddcb9b0 to 8c37e64 Compare December 8, 2021 21:02
Copy link
Collaborator Author

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @andreimatei, and @dhartunian)


-- commits, line 71 at r10:

Previously, andreimatei (Andrei Matei) wrote…

something here

done.


pkg/ccl/kvccl/kvtenantccl/tenant_trace_test.go, line 40 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please comment on the redactable arg, explaining that when it's not set, you don't get any messages in the trace

This arg is no longer necessary because we don't test


pkg/ccl/kvccl/kvtenantccl/tenant_trace_test.go, line 49 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think this test would look a lot better if it used the WithStatementTrace testing knob instead of session tracing. Feel free to address in a different PR if you prefer.

Will do separately.


pkg/ccl/kvccl/kvtenantccl/tenant_trace_test.go, line 75 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please add a comment stating that queries from the system tenant get unredacted traces, and perhaps hint at the mechanism

Done.


pkg/ccl/kvccl/kvtenantccl/tenant_trace_test.go, line 100 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please elevate this to a constant somewhere shared with the production code. Take the opportunity to rename it away from sRedactedMarker. I don't know who started with these type prefixes.

Done.


pkg/ccl/kvccl/kvtenantccl/tenant_trace_test.go, line 100 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

it's not clear to me when this "message redacted" string is expected exactly. redactRecordingForTenant only seems to use it when processing DeprecatedFields, which is supposed to go away. And even then, only when sp.RedactableLogs is set.
I'm not sure how the test is passing; how come DeprecatedFields arebeing used to render the trace, instead of the Message field?

This test was originally written to deal with the case of dropping tenant traces. Now that we redact and keep them, the expectation is that we don't see that redacted marker in the results.


pkg/server/node.go, line 1059 at r13 (raw file):

Previously, abarganier (Alex Barganier) wrote…
tr.SetRedactable(true)

Do we want redactability to be turned off when the request is made for a system tenant? Does this not configure the "node-global" tracer to redactable = true without any way for it to be turned back off for system tenants?

True. Since Serverless is the only deployment that will pay this cost, and is paying it system-wide already we're kepping this as a one-way switch unless you restart the node. I am happy to add further configuration knows as follow-ups.


pkg/util/log/clog_test.go, line 780 at r8 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: move this above ResetTimer

Done.


pkg/util/log/log_entry.go, line 133 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

well then why don't we just make this be logEntry.String() if it's faster? If you do that, put a comment on it than it doesn't follow the usual redact.StringWithoutMarkers(e) pattern for String() implementations of SafeFromatters because performance.

But also see my comments on eventInternal. I think if you want to do all I describe there, this method does not need to exist at all (although it might be a good idea to make logEntry.String() fast regardless of the current PR).

Done.


pkg/util/log/log_entry.go, line 147 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

if this method stays different from logEntry.String, then considerations about SQL don't apply, do they? And I don't understand what this comment is referring to anyway. But on the other hand, I do not want a trailing colon anyway. I'd get rid of this comment.

Done.


pkg/util/log/trace.go, line 104 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…
may or may not be redactable

consider linking to logEntry.redactable.

Done.


pkg/util/log/trace.go, line 105 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

these "best results" need more words, I think. Say that a non-redactable logEntry will be completely redacted if sp.Redactable()

Done.


pkg/util/log/trace.go, line 106 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

(perhaps it can terminate the depth and prepending the file:line to the format)

If we do all this, and also push the handling of the depth inside sp.Recordf, then the logging code could also stop caring about sp.IsVerbose() in getSpanOrEventLog(), which would be another win in my book. The logging code could blindly delegate all the work to the span, and a non-verbose span could short-circuit cheaply.

Gotcha. Made the first change. I like that.


pkg/util/log/trace.go, line 115 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

this comment belongs around FastString

Done.


pkg/util/tracing/tracer.go, line 175 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

this comment suggests that "tenant-bound spans" are somehow different from other spans, but that's not true, is it?

Done.


pkg/util/tracing/tracer.go, line 314 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

it also affects spans that have already been opened, hasn't it? If anything, this should talk about future collection of recordings.

Done.


pkg/util/tracing/tracingpb/recorded_span.proto, line 75 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please remove the word "cycle" from here; it's very confusing.

Done.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

but I kinda insist on making tr.redactable be based on more static configuration, rather than setting it on RPCs.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier, @andreimatei, and @dhartunian)


pkg/ccl/kvccl/kvtenantccl/tenant_trace_test.go, line 40 at r10 (raw file):

Previously, dhartunian (David Hartunian) wrote…

This arg is no longer necessary because we don't test

did you backtrack this change in the wrong commit?


pkg/ccl/kvccl/kvtenantccl/tenant_trace_test.go, line 116 at r16 (raw file):

		require.True(t, found, "did not see expected trace message '%q':\n%s",
			visibleString, sqlutils.MatrixToStr(results))
		require.False(t, foundRedactedMarker, "unexpectedly found '%q':\n%s",

I'm still confused about this check. What is its point? That marker has something to do with old versions, right?


pkg/server/node.go, line 1058 at r16 (raw file):

	// resolved via #58610, this code can be removed so that all traces
	// have redactability enabled.
	if !tr.Redactable() && tenID != roachpb.SystemTenantID {

I imagined that redactability would be more static configuration (flag, or cluster setting).
I figure there must be something else telling a kv node that it's part of a multi-tenant cluster, and we'd piggy back on that.


pkg/server/node_tenant_test.go, line 51 at r16 (raw file):

		tracer.SetRedactable(true)
		ctx, sp := tracer.StartSpanCtx(ctx, "foo", tracing.WithRecording(tracing.RecordingVerbose))
		sp.SetVerbose(true)

we shouldn't need SetVerbose(), we started recording already above when we created the span.


pkg/util/tracing/span_inner.go, line 189 at r16 (raw file):

		str = redact.Sprintf(format, args...)
	} else {
		// Format is considered safe but here we want to make sure the

s/Format/format (to refer to the variable)?
I'm not sure what the first part of this comment is telling me, and why there's a "but". I suspect other readers will be confused too.

This comment should spell out that fmt.Sprintf is faster than redact.Sprintf(), and it should particularly mention logEntry.

via `BenchmarkEventf_WithVerboseTraceSpan`.

```
❯ benchstat before.txt after.txt
name                           old time/op    new time/op    delta
Eventf_WithVerboseTraceSpan-8    5.01µs ± 1%    4.37µs ± 2%  -12.72%  (p=0.000 n=8+9)

name                           old alloc/op   new alloc/op   delta
Eventf_WithVerboseTraceSpan-8    1.20kB ± 0%    1.08kB ± 0%  -10.07%  (p=0.000 n=10+10)

name                           old allocs/op  new allocs/op  delta
Eventf_WithVerboseTraceSpan-8      22.0 ± 0%      16.0 ± 0%  -27.27%  (p=0.000 n=10+10)
```

Release note: None
When we discovered that tenants could "trace into" KV, we considered
this a security problem since no mechanism ensured that tenants could
receive data they are allowed to see. The KV team reacted to this with
them at the KV-Tenant boundary.

Serverless has shipped with this redactability in place in order to
prevent data leakage across the boundary with KV and now needs to get
back on the `release-21.2` branch to sync with the rest of development.

Unfortunately, making traces redactable had a [significant] performance
overhead. On simple selects, this overhead is in the 10-20% area (when
verbose tracing is enabled).

This presents a stability risk, since some customers run with verbose
tracing enabled for all queries. Once they switch to 21.2, their system
may be underprovisioned as a result of the newly grown tracing overhead
in the `log.{Ve,E}vent{,f}` family of methods. The peformance hit comes
via the additional costs of redacting log strings via the
`redact.Sprintf` function.

This PR changes makes trace redactability a choice. Because we needed to
support mixed-version clusters in cockroachdb#70562 this is not as big a lift as
might have been expected.

The approach taken is that tracers and the spans they generate can be
marked as "redactable". If the flag is enabled, logging events to the
span will use full redaction via `redact.Sprintf`. If the flag is not
enabled a coarse redation of the entire message is done instead. A
further commit will enable redactability on tenant-bound traces since we
do want to maintain that functionality in multi-tenant deployments.

Original issue investigation that kicked off concerns about performance
impact of redaction:
cockroachdb#70110 (comment)

Touches cockroachdb#70562.

Release note: None
More CPU and a slightly different allocation profile, though the
allocations are largely driven by the arguments passed to `Eventf` and
they are trivial in this benchmark.

```
go test ./pkg/util/log -run - -bench BenchmarkEvent -benchtime 3s
goos: linux
goarch: amd64
pkg: github.com/cockroachdb/cockroach/pkg/util/log
cpu: AMD Ryzen 7 3800X 8-Core Processor
BenchmarkEventf_WithVerboseTraceSpan/redactable=false-16   759444  4903 ns/op  961 B/op   18 allocs/op
BenchmarkEventf_WithVerboseTraceSpan/redactable=true-16    561772  6831 ns/op  1081 B/op  16 allocs/op
```

Release note: None
@dhartunian dhartunian force-pushed the make-traces-redactable-and-default-to-off--master branch from 8c37e64 to 7c768ac Compare December 9, 2021 01:20
Copy link
Collaborator Author

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @andreimatei, and @dhartunian)


pkg/ccl/kvccl/kvtenantccl/tenant_trace_test.go, line 40 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

did you backtrack this change in the wrong commit?

yeah I messed something up. Going to restore the full test in the PR since tenant-level redaction is now configurable as well.


pkg/ccl/kvccl/kvtenantccl/tenant_trace_test.go, line 116 at r16 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'm still confused about this check. What is its point? That marker has something to do with old versions, right?

I added an explanation of why we would never expect this to show up. Seems like it will be appropriate to remove once we remove the TraceRedactedMarker altogether.


pkg/server/node.go, line 1059 at r13 (raw file):

Previously, dhartunian (David Hartunian) wrote…

True. Since Serverless is the only deployment that will pay this cost, and is paying it system-wide already we're kepping this as a one-way switch unless you restart the node. I am happy to add further configuration knows as follow-ups.

Added a boolean cluster setting for this which should alleviate concerns.


pkg/server/node.go, line 1058 at r16 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I imagined that redactability would be more static configuration (flag, or cluster setting).
I figure there must be something else telling a kv node that it's part of a multi-tenant cluster, and we'd piggy back on that.

Added boolean cluster setting.


pkg/server/node_tenant_test.go, line 51 at r16 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

we shouldn't need SetVerbose(), we started recording already above when we created the span.

Done.


pkg/util/tracing/span_inner.go, line 189 at r16 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/Format/format (to refer to the variable)?
I'm not sure what the first part of this comment is telling me, and why there's a "but". I suspect other readers will be confused too.

This comment should spell out that fmt.Sprintf is faster than redact.Sprintf(), and it should particularly mention logEntry.

Done.

@dhartunian dhartunian force-pushed the make-traces-redactable-and-default-to-off--master branch from 7c768ac to e41a2a8 Compare December 9, 2021 01:56
Prior commits added the ability to set a flag on the Tracer to determine
whether Span's should be marked redactable or not. This commit adds
a cluster setting that will enable fine-grained trace redactability. The
intended use is for multi-tenant deployments where KV traces should be
redacted prior to being returned to tenants. Enabling the flag will enable
fine-grained redaction so that tenant operators can continue to debug
their queries via redacted kv trace information instead of having that
information stripped out.

The desired outcome is to preserve redactability for tenant traces while
preserving a non-redactable code path for requests using the system
tenant. This preserves existing functionality in production while
keeping the current performance hit of fine grained redaction from
propagating to non-tenant deployments.

Benchmarking this against master results in the following improvements.
old.txt is using `master` @ 65920cf
new.txt is using this commit @ 7c768ac
Benchmark was run using `gceworker`.

We gain back a significant portion of the performance hit:
```
david@davidh:~/go/src/github.com/cockroachdb/cockroach$ benchstat old.txt new.txt
name                                              old time/op    new time/op    delta
Tracing/1node/scan/trace=100%/threshold=1ms-24      1.20ms ± 4%    1.15ms ± 2%   -4.83%  (p=0.000 n=10+9)
Tracing/1node/insert/trace=100%/threshold=1ms-24    1.51ms ± 3%    1.40ms ± 2%   -6.71%  (p=0.000 n=10+10)
Tracing/3node/scan/trace=100%/threshold=1ms-24      2.17ms ± 3%    2.10ms ± 4%   -3.26%  (p=0.003 n=10+10)
Tracing/3node/insert/trace=100%/threshold=1ms-24    2.41ms ± 3%    2.30ms ± 3%   -4.45%  (p=0.000 n=8+9)

name                                              old alloc/op   new alloc/op   delta
Tracing/1node/scan/trace=100%/threshold=1ms-24       115kB ± 1%     120kB ± 1%   +4.39%  (p=0.000 n=10+10)
Tracing/1node/insert/trace=100%/threshold=1ms-24     149kB ± 1%     152kB ± 0%   +1.92%  (p=0.000 n=10+9)
Tracing/3node/scan/trace=100%/threshold=1ms-24       276kB ± 3%     282kB ± 3%   +2.18%  (p=0.035 n=10+10)
Tracing/3node/insert/trace=100%/threshold=1ms-24     265kB ± 1%     264kB ± 2%     ~     (p=0.963 n=8+9)

name                                              old allocs/op  new allocs/op  delta
Tracing/1node/scan/trace=100%/threshold=1ms-24       1.68k ± 0%     1.47k ± 0%  -12.16%  (p=0.000 n=9+9)
Tracing/1node/insert/trace=100%/threshold=1ms-24     2.05k ± 0%     1.76k ± 0%  -14.46%  (p=0.000 n=10+8)
Tracing/3node/scan/trace=100%/threshold=1ms-24       3.10k ± 0%     2.85k ± 0%   -8.21%  (p=0.000 n=10+10)
Tracing/3node/insert/trace=100%/threshold=1ms-24     3.00k ± 0%     2.66k ± 0%  -11.30%  (p=0.000 n=9+10)
```

Resolves cockroachdb#71694

Release note: None
@dhartunian dhartunian force-pushed the make-traces-redactable-and-default-to-off--master branch from e41a2a8 to 5cbdec8 Compare December 9, 2021 02:31
@dhartunian
Copy link
Collaborator Author

bors r=andreimatei

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third commit: 💋
(but also 🤦 for the first time around when the code was written)

Reviewed 14 of 14 files at r17, 1 of 1 files at r18, 1 of 1 files at r19, 10 of 10 files at r20, 1 of 1 files at r21, 5 of 6 files at r22, 1 of 1 files at r23, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @andreimatei, @dhartunian, and @knz)

@craig
Copy link
Contributor

craig bot commented Dec 9, 2021

Build succeeded:

@craig craig bot merged commit b4ab627 into cockroachdb:master Dec 9, 2021
@dhartunian dhartunian deleted the make-traces-redactable-and-default-to-off--master branch November 14, 2022 16:37
michae2 added a commit to michae2/cockroach that referenced this pull request Dec 9, 2022
For redaction of statement diagnostics bundles we need the ability to
redact traces of query execution. The general ability to redact traces
was added in cockroachdb#70562 and cockroachdb#73405, but due to performance impact,
fine-grained redaction was turned off by default and redactability was
made configurable per tracer.

When tracing query execution for a redacted statement bundle, we want
fine-grained redaction for a specific span (and its children) regardless
of its tracer's redactability. This change addresses this by plumbing
through a flag to set redactability on individual spans, which is
inherited by child spans.

(I considered making this another recording mode instead of a separate
field, but redactability seems like an orthogonal concern to verbose or
structured recording mode. Plus we may want to remove this flag one day
if we enable fine-grained redaction for all traces, and that will be
easier with it as a separate field.)

Epic: CRDB-19756

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tracing: perf regression due to trace redactability
6 participants