-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: make all ways to associate data with a Span redactable #58610
Comments
Hi @tbg, I've guessed the C-ategory of your issue and suitably labeled it. Please re-label if inaccurate. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
I understand this was considered to fall under obs infra. |
This has popped up on my radar again in the context of #70406. I don't know how urgent it is, but at some point I suspect the serverless UX will require redacted traces so that a) we can be sure we don't leak PII to SQL tenants and b) we can continue to rely on tracing to troubleshoot problems. edit: some (internal) discussion in https://cockroachlabs.slack.com/archives/C01CDD4HRC5/p1631905968047900 |
I'm not sure what you mean by "decide on what timeline changes need to be implemented here"? |
Discussed privately on Slack with @aaron-crl:
I'll prioritize getting #70406 merged and backported to release-21.2 branch as well as 21.1. |
This commit changes the `(*Node).Batch` endpoint to redact trace recordings that were collected on behalf of a tenant. This notably does not affect the structured payloads. We need at least one of them - ContentionEvent - to remain intact as the keys within are used by SQL observability. Touches cockroachdb#58610. It does not close it because there may be other endpoints that support tracing, and even if there aren't, there is nothing that keeps us from accidentally re-introducing this problem again in the future. Release note: None
I'm going to go ahead with #70562. After that PR is merged, we still have to develop a story for structured payloads. At the time of writing, |
An additional question that came up in the PR is around redaction when logging to external opentracing tracers. As of #70562, no redaction is performed for these backends, but ultimately we may want some. This seems of lower priority though, lest we start wanting to do that in Serverless. |
This commit changes the `(*Node).Batch` endpoint to redact trace recordings that were collected on behalf of a tenant. This notably does not affect the structured payloads. We need at least one of them - ContentionEvent - to remain intact as the keys within are used by SQL observability. Touches cockroachdb#58610. It does not close it because there may be other endpoints that support tracing, and even if there aren't, there is nothing that keeps us from accidentally re-introducing this problem again in the future. Release note: None
This change removes the string representation of the span being exported from the structured recording written to the tracing span. The change was motivated by discussions around redactabilty and tenants being able to see unredacted traces from the KV host server. More information can be found #58610. These traces are not very critical and so until there is a larger change for structured recordings to support redactability we can do away with them. Informs: #58610 Release note: None
This commit changes the `(*Node).Batch` endpoint to redact trace recordings that were collected on behalf of a tenant. This notably does not affect the structured payloads. We need at least one of them - ContentionEvent - to remain intact as the keys within are used by SQL observability. Touches #58610. It does not close it because there may be other endpoints that support tracing, and even if there aren't, there is nothing that keeps us from accidentally re-introducing this problem again in the future. Release note: None
Touches cockroachdb#58610. The preceding commits reverted redactability of traces, so we need to find another way to prevent sensitive information from leaking to tenants. At this point, the method is crude: just don't return any trace recording to tenants at the KV boundary. This is a loss of observability that could backfire, but we can improve upon it. Note that we have coverage of this through TestTenantTracesAreRedacted. Release note: None
Unfortunately I have to re-add Discussions about the options is here: #71610 (comment) |
Touches cockroachdb#58610. The preceding commits reverted redactability of traces, so we need to find another way to prevent sensitive information from leaking to tenants. At this point, the method is crude: just don't return any trace recording to tenants at the KV boundary. This is a loss of observability that could backfire, but we can improve upon it. Note that we have coverage of this through TestTenantTracesAreRedacted. Release note: None
Now that #71606 is merged, here's where we are with release-21.2:
On master, the redactability work is still present, including the performance problems. I'm going to open a separate issue about that. |
Filed the issue for master: #71694 |
70792: server: expect duplicate session cookies on HTTP requests r=catj-cockroach,knz a=dhartunian Previously, the golang HTTP library was used to retrieve a cookie with the specific name we use to store our authentication token. This API method retrieves the *first* matching cookie in the header and ignores the rest. This can cause issues with requests that contain multiple matching cookies with the same name (if other applications are potentially running on the same domain). This change adjusts the HTTP API we use to retrieve cookies (we now prefer to call `req.Cookies()` and iterate through the entire list ourselves, instead of relying on the API to give us the matching cookie since it only supports returning a single result. Resolves #70376. Release note (security update): Authenticated HTTP requests to nodes can now contain additional cookies with the same name as the one we use ("session"). Duplicates like this are permitted by the HTTP spec and our code will now attempt to parse all cookies with a matching name before giving up. This can resolve issues with customers who run other services on the same domain as their CRDB nodes. 71503: server: Tenant pods fan out to cancel SQL queries r=matthewtodd a=matthewtodd Partially addresses #70832. Previously, the tenant `/_status/cancel_query` endpoint was not available over HTTP and could not cancel a query not running locally. This change opens it up, so that we can call it from the cloud console, and makes sure that that we can cancel queries when a tenant has more than one SQL pod running. Release note (api change): The tenant `/_status/cancel_query` endpoint was made available over HTTP and augmented to fan out requests in a multi-pod world. 71677: row: pass in account instead of monitor for KV fetcher r=yuzefovich a=yuzefovich This commit switches the KV fetcher to expect a memory account rather than a memory monitor. It additionally creates a separate memory account for the KV fetcher when it is used by the cFetcher (previously, we were reusing the monitor from the allocator). Note that we don't need to update multiple places where we call methods on possibly nil memory account because all methods have been taught to be noops when the receiver is `nil`. Fixes: #55481. Release note: None 71698: roachtest: add kv95/enc=false/nodes=3/tracing r=[dhartunian,stevendanna] a=tbg We don't have a good grasp on what it means for performance to turn on tracing globally. A kv95 (read-mostly) workload is a good baseline here as we expect it to be among the most severely impacted. Touches #71694. Touches #58610. Release note: None Co-authored-by: David Hartunian <davidh@cockroachlabs.com> Co-authored-by: Matthew Todd <todd@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
This reverts commit fc95ad8. This PR restores the ability for tenant traces to see unstructured recordings from the KV layer. Please see note in cockroachdb#71694: cockroachdb#71694 (comment) The need to enforce such strict redaction of traces for tenant queries is not necessary. Customer views into the KV layer will be necessary in order to debug query execution in multi-tenant settings. Redaction is certainly needed at this boundary but must be configured more carefully. Further PRs as necessitated by cockroachdb#58610 and cockroachdb#71694 will tackle these issues on master (where trace redaction is still available) and will be backported to the release branch as needed. This PR restores tracing into KV for tenants and unblocks the ability for serverless to use the release-21.2 branch. Release note (ops change): Tracing queries from tenants will now include unstructured log messages from the kv layer.
This reverts commit 22ab4ea. This commit changes the `(*Node).Batch` endpoint to redact trace recordings that were collected on behalf of a tenant. This notably does not affect the structured payloads. We need at least one of them - ContentionEvent - to remain intact as the keys within are used by SQL observability. Touches cockroachdb#58610. It does not close it because there may be other endpoints that support tracing, and even if there aren't, there is nothing that keeps us from accidentally re-introducing this problem again in the future. Release note: None
This reverts commit fc95ad8. Touches cockroachdb#58610. The preceding commits reverted redactability of traces, so we need to find another way to prevent sensitive information from leaking to tenants. At this point, the method is crude: just don't return any trace recording to tenants at the KV boundary. This is a loss of observability that could backfire, but we can improve upon it. Note that we have coverage of this through TestTenantTracesAreRedacted. Release note: None
This reverts commit 22ab4ea. This commit changes the `(*Node).Batch` endpoint to redact trace recordings that were collected on behalf of a tenant. This notably does not affect the structured payloads. We need at least one of them - ContentionEvent - to remain intact as the keys within are used by SQL observability. Touches cockroachdb#58610. It does not close it because there may be other endpoints that support tracing, and even if there aren't, there is nothing that keeps us from accidentally re-introducing this problem again in the future. Release note: None
Is your feature request related to a problem? Please describe.
Tracing information can be sent to the SQL root user, debug.zip, and external tracers. It is desirable to apply our budding redactability best practices here as well.
Describe the solution you'd like
Make all ways to associated data with a Span redactable. This primarily includes free-form text messages (
log.Event
and co) as well as structured payloads attached to a Span. The former should get the treatment our regular logging APIs have already received, and the latter should be required to implementSafeFormatter
.Other data associated with spans are the operation name, tags, log tags, and baggage items. Tags in particular tend to contain PII and thus need redaction.
Describe alternatives you've considered
Additional context
Jira issue: CRDB-3379
The text was updated successfully, but these errors were encountered: