Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable HttpServer / HttpClient instrumentation to exclude query strings #2302

Closed
ben-childs-docusign opened this issue Feb 16, 2021 · 13 comments
Labels
enhancement New feature or request

Comments

@ben-childs-docusign
Copy link

Is your feature request related to a problem? Please describe.
We would like to exclude query strings from http client and http server spans from our traces in order to avoid logging sensitive information to our tracing infrastructure.

Describe the solution you'd like
Ideally there would be some ability to configure which parts of the URL should be included in these spans, for our purposes we would just like the scheme, host and path.

Describe alternatives you've considered
We could write custom code to instrument our services or build a custom version of the instrumentation library but I think this would be a generally useful feature.

Additional context
This is the current code in HttpServerTracer for setting the url attribute:

  private void setUrl(Span span, REQUEST request) {
    span.setAttribute(SemanticAttributes.HTTP_URL, url(request));
  }

If it is possible to access configuration here we could parse the url and scrub the sections as defined in configuration.

@ben-childs-docusign ben-childs-docusign added the enhancement New feature or request label Feb 16, 2021
@iNikem
Copy link
Contributor

iNikem commented Feb 17, 2021

I think this is a very sensible request. My only concern is that if we bring in a configuration for removing query string, then why not fragment and/or path as well? :) Just having one configuration option seems random, having all of them seems like explosion. We have to find some balance...

@ben-childs-docusign
Copy link
Author

ben-childs-docusign commented Feb 17, 2021

We could have a config that states which parts of the url to keep. Something like

default: http-url.include-parts=all
other options: scheme, host, port, path, query, fragment (multiple specified separated by comma?)
or: none

@breedx-splk
Copy link
Contributor

We could have a config that states which parts of the url to keep. Something like

default: http-url.include-parts=all
other options: scheme, host, port, path, query, fragment (multiple specified separated by comma?)
or: none

I think the original ask makes a lot of sense (although sensitive data in query strings and paths should probably be sternly discouraged)...but I'm not sure it makes sense to provide a la carte choosing of url components. In fact, it makes it easy to break url syntax this way.

I'll offer up a balance as @iNikem suggested:

  • config item something like otel.traces.http.redact
  • 3 possible values: nothing, path, query
  • default is nothing
  • values can be combined, comma separated
  • combining nothing with any other value is discouraged and results in undefined behavior.
  • combining path and query in any order is fine
  • fields that are to be redacted should be replaced with the literal string REDACTED. For example, if the true url is http://example.com/myapp?sekret=123abc then
    • if query is to be redacted, this becomes http://example.com/myapp?sekret=REDACTED
    • if path is to be redacted, this becomes: http://example.com/REDACTED?seckret=123abc
    • if both query and path are specified, this becomes http://example.com/REDACTED?seckret=REDACTED

What do you think?

@ben-childs-docusign
Copy link
Author

this proposal sounds reasonable, I might extend it to include the ability to redact fragment and avoid parsing the query string and just replace the entire query string with REDACTED

so in your example it would be http://example.com/myapp?REDACTED

@trask
Copy link
Member

trask commented Feb 18, 2021

cc: @johnbley who has been thinking about this open-telemetry/oteps#100

@johnbley
Copy link
Member

I say if we're going to bother with this, let's go all-out and have a SpanFilter interface and allow people to use the full power of a programming language to manipulate all attributes/events/span names (rather than one-offing bits and pieces of individual strings with declarative mechanisms like regexes). SpanProcessor is not ideal for this purpose at the moment because the spans are not supposed to be writeable/editable at onEnd (though that could be relaxed). Something like:

class MyFilter implements SpanFilter {
  public void filter(Span s) {
    String url =s.getAttribute("http.url");
     if (url != null) {
       s.setAttribute(redact(url));
     } } } 
tracerProvider.addFilter(new MyFilter());

We can grow a declarative layer on top of this if desired, akin to ideas proposed above, but my view is that app authors/owners will always come up with something new/unpredicated and there needs to be a safety valve we can point them to.

NB I did something similar with SpanProcessor for our RUM product for a similar concern (full URL contains sensitive data), but its mechanism will probably need to change soon if otel-js enforces read-only at SpanProcessor.onEnd
https://github.com/signalfx/splunk-otel-js-browser#spanprocessor

@ben-childs-docusign
Copy link
Author

I agree that a general purpose mechanism would be even better. If SpanProcessor works for now that could unblock our implementation so I'll take a look at it.

@ben-childs-docusign
Copy link
Author

Is it possible to inject a SpanProcessor into the agent from the application jar? I tried to use SdkTracerProviderConfigurer to inject a span processor but it seems that the service loader in the agent initialization path doesn't see the services registered in the application jar.

@ben-childs-docusign
Copy link
Author

It also appears that the java SpanProcessor enforces that attributes cannot be set after a span has ended. So the SpanProcessor approach does not work at the moment.

@trask
Copy link
Member

trask commented Aug 25, 2021

hey @ben-childs-docusign, this can be accomplished by implementing a delegating SpanExporter which "decorates" the SpanData before passing it on to the underlying SpanExporter, check out https://github.com/open-telemetry/opentelemetry-java/blob/4a945e63e3bd1d309ba189dcd3c6c7a0b5db535c/sdk-extensions/tracing-incubator/src/main/java/io/opentelemetry/sdk/extension/incubator/trace/data/DelegatingSpanData.java#L22-L48

and check out the new extension mechanism that makes extending the javaagent with custom SpanExporter much easier now: https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/examples/extension#introduction

@gustaff-weldon
Copy link

I have a similar request (#5680), but I would like to drop the certain urls from being tracked entirely. @trask what would be your suggestion here?

@trask
Copy link
Member

trask commented Mar 24, 2022

linking to @mateuszrzeszutek's response #5680 (comment)

@trask
Copy link
Member

trask commented Dec 22, 2024

Closing based on #2302 (comment)

@trask trask closed this as completed Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants