Skip to content

Conversation

@jamescrosswell
Copy link
Collaborator

@jamescrosswell jamescrosswell commented May 12, 2023

Fixes #2078

Description

Any URLs that are captured by the SentrySDK are sanitized per the spec (see refs below) before being sent to Sentry.

When we make HTTP requests to third party services we create a breadcrumb with the URL and also create a span that has the URL as a description. Additionally, our failed outbound HTTP requests containing UserInfo can be captured and sent to Sentry - PII in these also needs to be redacted.

Make sure that all integrations that record outgoing or incoming HTTP request structure the data like described in the
spec linked above.

References

Approach

The SentryHttpMessageHandler does things like this to add a breadcrumb:

        var breadcrumbData = new Dictionary<string, string>
        {
            {"url", url},
            {"method", method},
            {"status_code", ((int) response.StatusCode).ToString()}
        };
        _hub.AddBreadcrumb(string.Empty, "http", "http", breadcrumbData);

SentryHttpMessageHandler also creates a span as follows:

       // Start a span that tracks this request
        // (may be null if transaction is not set on the scope)
        // e.g. "GET https://example.com"
        var span = _hub.GetSpan()?.StartChild("http.client", $"{method} {url}");

Breadcrumbs and various other bits of data reside on the Scope most of the time. This scope data get's applied
to IEventLike objects (which includes Transaction and SentryEvent) before they get sent to Sentry by the
SentryClient.

Spans are created directly on the Transaction.

To ensure Personally Identifiable Information (PII) does not get with SentryEvents, if SendDefaultPii == false we
redact any PII in the SentryClent.DoSendEvent method, just before events are sent to Sentry.

Similarly, we redact PII from Transactions in the SentryClient.CaptureTransaction method just before the transaction
is stuffed in an Envelope.

Both SentryEvent and Transaction now have an internal void Redact() method that redacts any PII data in their
member fields or recursively calls Redact() on any complex members (e.g. SentryEvent.Breadcrumbs and
Transaction.Breadcrumbs).


Java implementation

The Java SDK implementation has a urlWithAuthRemoved
method they use to filter out the Auth portion of URLs:

  private static final @NotNull Pattern AUTH_REGEX = Pattern.compile("(.+://)(.*@)(.*)");

  private static @NotNull String urlWithAuthRemoved(final @NotNull String url) {
    final @NotNull Matcher userInfoMatcher = AUTH_REGEX.matcher(url);
    if (userInfoMatcher.matches() && userInfoMatcher.groupCount() == 3) {
      final @NotNull String userInfoString = userInfoMatcher.group(2);
      final @NotNull String replacementString =
          userInfoString.contains(":") ? "[Filtered]:[Filtered]@" : "[Filtered]@";
      return userInfoMatcher.group(1) + replacementString + userInfoMatcher.group(3);
    } else {
      return url;
    }
  }

And here are the unit tests.

@github-actions
Copy link
Contributor

github-actions bot commented May 12, 2023

Messages
📖 Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against 97f104a

@SeanFeldman
Copy link
Contributor

It feels like this would be a cleaner and more robust approach. It'll take a bit more effort up front but I suspect it will be more scalable and save us a bunch of effort further down the road.

I understand that the first option may not be ideal, but I have concerns that using a serializer approach would result in dual responsibility, sanitization and serialization, which could be difficult to manage.

@jamescrosswell
Copy link
Collaborator Author

I have concerns that using a serializer approach would result in dual responsibility, sanitization and serialization

Yeah, that would be a fair call if the JsonSerializer was actually a performing serialization. In fact all it's doing is creating a UtfJsonWriter and passing that into the WriteTo method of whatever IJsonSerializable instance it's dealing with (see here). The WriteTo method gets the json writer to serialize each of it's members. So the real serializer is the UtfJsonWriter.

I think I'll probably just add another method to the IJsonSerializable interface with an overload of the WriteTo method that accepts a bool sanitizeData parameter or something.

@jamescrosswell
Copy link
Collaborator Author

I understand that the first option may not be ideal, but I have concerns that using a serializer approach would result in dual responsibility, sanitization and serialization, which could be difficult to manage.

@SeanFeldman finally I think you were right about this. When I dug into it futher, most of the serialization happens in a bunch of extension methods we've written in JsonSerializableExtensions... so I've reverted to the original approach.

@jamescrosswell jamescrosswell marked this pull request as ready for review May 15, 2023 10:18
@mattjohnsonpint mattjohnsonpint marked this pull request as draft May 15, 2023 21:01
- Included some comments on why we don't redact User and Request on the Transaction and SentryEvent.
- Moved FluentAssertions.Execution to global usings in Directory.Build.props
- Added end to end test to ensure events captured on the SentryClient get redacted, if necessary
@jamescrosswell jamescrosswell marked this pull request as ready for review May 16, 2023 01:54
@jamescrosswell jamescrosswell changed the title Sanitize urls Remove authority from URLs sent to Sentry May 18, 2023
@jamescrosswell jamescrosswell enabled auto-merge (squash) May 18, 2023 00:23
@jamescrosswell jamescrosswell merged commit 476f1c9 into main May 18, 2023
@jamescrosswell jamescrosswell deleted the feat/sanitize-urls-2078 branch May 18, 2023 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sanitize sensitive data from URLs sent to Sentry

3 participants