-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL query string values should be redacted by default #961
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but it would also be great to update examples (or add new REDACTED ones)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Left a nit suggestion about using the word SHOULD for instrumentations to provide option to not redact.
# your pull request title with [chore] or use the "Skip Changelog" label. | ||
|
||
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' | ||
change_type: bug_fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recognize that the specification allows this as a non-breaking change because it is an attribute value, but I am very concerned about the end user experience of this change. This change will break alerts/slos/boards that expect query parameters to be present in the attribute named url.full
or url.query
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially painful for users who are treating the query parameters as non-sensitive data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in practice this can be a pretty substantial breaking change for end users, most of whom likely don't have sensitive information in these query params
I would be more comfortable with keeping the current language and providing an opt-in redaction processor so as not to break existing users. |
I think it's fair to leave the registry docs the same as they were and put the opt-out redaction as an augmented description for http semconv. |
Is there any chance that you provide environmental variable name as an option to disable redaction? Without this I expect couple of different implementations in various languages. |
docs/url/url.md
Outdated
**[3]:** Query string values SHOULD be redacted by default and replaced by the value `REDACTED`, e.g. `q=REDACTED&v=REDACTED` (the query string keys SHOULD be preserved). | ||
Instrumentation MAY provide a configuration option to capture the full query string without any redaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make specific suggestions, I'd prefer this state that query string values MAY be redacted instead of SHOULD and going as far as stating that Instrumentation MUST
provide a way to redact query string.
Based on this point from @Kielek a broader proposal: we will run into questions around sensitivity again and again, and right now it is solved selectively via inline comments in the specification (there are other ones for enduser.* and db.* properties), while this needs a broader solution. I suggest that such a solution could be designed in a way that it is decoupled from the attribute requirements and by that "solves" this problem. Here is a rough outline:
I anticipate that this comes with a set of downsides (SDKs need to implement that filtering,redaction,etc., / those processes may be resource incentive for a busy node / ...), but this is something that in my experience end-user will ask for eventually anyhow (at least from my indirect experience with APM agents). |
It is important for native instrumentations to redact sensitive data or make users opt into the collection. For example, when writing logs, OTel is just one of possible logging providers and there are no guarantees secrets (if any) will be scrubbed by something in the pipeline. I can see the world where:
This allows instrumentations to protect themselves by having safe default (if my Azure SDK instrumentation leaked a credential, I'm responsible, not the OTel). If the distro has user permission or is brave enough, it can enable collection for all instrumentations. |
I disagree: instrumentation libraries and native instrumentations should not carry the responsibility to redact sensitive data or provide users to opt into the collection from an OpenTelemetry perspective. If they use other logging providers or if they provide APIs to other tracing/metric providers, they can (and may be somtimes have to) do their own redaction, but that's not what we (OpenTelemetry) should expect them to do. I argue that filtering and redaction should be a responsibility of the otel SDK and in no way the instrumentation library or the API should be expected to do the filtering:
To double down on this: I think that sampling and redaction share a lot of requirements, and therefore redaction should live in the Tracing/Logging/Metrics/Profiling SDK. There may be different strategies for redaction as well and there is room for innovation (see probabilistic sampling) and with the development of technology (and here: changes of legislation) requirements may change independent of the semantic conventions and specification |
@svrnm are you saying that database queries should also not be redacted by default? also, this PR is about security (leaking credentials) and not PII data, so I believe it applies to both production and non-production environments |
What I am saying is that the approach we take for sensitive data redaction needs to be approached differently. Right now we have places in the spec that say "redact this data" or "do not collect this data", but no guidance on where and how this data should be treated accordingly. Right now it seems that this is a responsibility of the instrumenting library, so each HTTP instrumentation library will need to manage the redaction of
I understand, but at the end it is about both. I also understand that this issue here is urgent and might need to be merged regardless to mitigate the security bug, but it it's still not clear then who is responsible for complying with this default? |
I mean, it could be a regular expression (or even some simple heurestics -- the provided examples of Azure/GCP/AWS calls with tokens in query strings, for example). I would point out that Datadog offers two options - redact query string, and redact paths with digits (https://docs.datadoghq.com/tracing/configure_data_security/?tab=http#trace-obfuscation) but they are both disabled by default. In addition, as other comments have pointed out, we ship a system for redaction today, it's in the collector, and our official guidance on production deploys is to use a collector. I am not arguing that we shouldn't have a redaction option, I am arguing that we should not default it to 'on'. I would point out that this proposed new behavior was opposed by comments from the community on the .NET repo where it originated (open-telemetry/opentelemetry-dotnet#5532 (comment)). edit: another example from the datadog docs, specifically around their library: https://docs.datadoghq.com/tracing/configure_data_security/?tab=http#library. they redact 'suspicious-looking' values through a regex, the default of which is below:
|
My experience in the past ~3 years in my capacity working for a vendor (Honeycomb) has been a mix:
|
I would also add that in five years at Lightstep I don't recall anyone having an issue with this. Any time we did have a customer send PII or other sensitive data, we had ways for them to delete it, and we offered redaction as a part of our collection pipeline for known sensitive keys/strings. |
I'd love to be able to do this. Could we all agree with something like:
? |
I would support this. |
I've sent a PR to motivate further discussion of this proposal: #971 |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Commenting to unstale |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
What is blocking this? |
This PR has lot of objections (judging from the discussions, thumbs-down reactions etc.). A scaled down version is here : #971 which has less opposition from what I can tell! |
Superceded by #971 |
Fixes #860
Changes
Query string values are now redacted by default due to concerns around leaking sensitive data.
This is not considered a breaking change because the OTel semantic conventions definition of stability specifically allows changing attribute values: