Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip the user's email address out of the location used by GA #466

Merged
merged 1 commit into from
Jul 17, 2018

Conversation

brucebolt
Copy link
Member

We have seen some paths in Google Analytics that include the user's email address, normally as part of the query string. These URLs always have zero pageviews, but a number of hits (GA measures hit as any activity, e.g. event, timing measurement, etc).

Although there is existing code in this repo to strip personal information (PII) from the path, there are some cases where GA generates a new request (e.g. page timings) that we have not covered. In this case, it reports the actual URL rather than our custom path. This aligns with the observation of these hits having no pageviews, since pageviews are already stripped of this information.

This PR addresses that problem by stripping email addresses out of all URLs and setting this as a global location for GA to use on all requests from that page.

https://trello.com/c/jKitCyZI/311-remove-emails-from-ga-events-on-email-alert-frontend-pages

@NickColley
Copy link
Contributor

Since this keeps coming up is there an opportunity to avoid this in the server side applications?

For example if we know certain routes allow user input:

/regular/route/:user-input

output this route to the frontend as a sort of 'mask' which then can take the current URL and remove this section from the URL.

/regular/route/email@address.com -> /regular/route/[redacted]

This means we're not relying on matching arbitrary patterns, which as we've seen seems prone to letting things slip through..

Copy link

@thomasleese thomasleese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few comments, but happy to approve.

@@ -20,6 +22,10 @@
sendToGa('set', 'displayFeaturesTask', null)
}

function stripLocationPII () {
sendToGa('set', 'location', stripPIIFromString(window.location.href.toString()))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think window.location.href is already a string so we don't need to call toString.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course. I was thinking of window.location, which is an Object.

@@ -160,6 +167,11 @@
}
}

function stripPIIFromString (string) {
var stripped = string.replace(EMAIL_PATTERN, '[email]')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the EMAIL_PATTERN variable should be in this function so it's closer to the code where it gets used and then maybe rename this to stripEmailAddressesFromString?

@bevanloon
Copy link
Contributor

@NickColley - thanks that's a great suggestion. We've had a chat about it with some of the email-alert alumni and others on the platform-health team. We do want to do the work to obfuscate things server side, but we want to take the time to do that work correctly, both from a product point of view and from a user-journey point of view and with proper oversight about what is and isn't PII.

I wonder if we could push this through as a temporary band-aid since it does stop the immediate and time-sensitive issue we have.

@NickColley
Copy link
Contributor

@bevanloon sure, I won't block this pull request just wanted to highlight that previous to this we've already had two 'temporary band-aids', so we should probably try a more robust approach in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants