Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent tracking based on link decoration via query string or fragment #4239

Closed
fmarier opened this issue Apr 25, 2019 · 25 comments · Fixed by brave/brave-core#3239
Closed
Assignees
Labels
feature/shields The overall Shields feature in Brave. priority/P3 The next thing for us to work on. It'll ride the trains. privacy/query-filter privacy/tracking Preventing sites from tracking users across the web QA Pass-Linux QA Pass-macOS QA Pass-Win64 QA/Yes release-notes/include

Comments

@fmarier
Copy link
Member

fmarier commented Apr 25, 2019

ITP 2.2 is reducing the lifetime of cookies set via document.cookie when the navigation came from a tracking-enabled page and the destination URL includes query string parameters or a fragment: https://webkit.org/blog/8828/intelligent-tracking-prevention-2-2/

We already block the third-party scripts that would be extracting these IDs and setting a first-party tracking cookie, but we could in theory go further by:

  • emulating the cookie lifetime restriction, or
  • stripping out tracking query string parameters (e.g. gclid, fbclid, msclkid and mc_eid).
@fmarier
Copy link
Member Author

fmarier commented Jun 10, 2019

@snyderp found a comprehensive list of tracking parameters in https://greasyfork.org/en/scripts/10096-general-url-cleaner.

@lukemulks
Copy link

A couple of questions / comments, only focusing on link decoration:

  1. Would this only apply for the sites listed in the greasyfork link above, or other domains?

  2. Re: YT in the greasyfork link, we should test to make sure blocking the prefetch doesn't break consecutive video playback.

  3. If we are already blocking 3rd parties that would profile data in conjunction with URL decoration, I am not clear on what the harm is in preventing the 1st party from using their own server logs to determine what their audience interests are, using link decorations. If the links aren't passing personal or identifiable information (given the scope/context of protection we have in place), it seems like we are removing a feature that they might leverage in the 1p context in a way that doesn't necessarily violate our privacy promises with our users.

I could be missing something, but here are the reasons why I am asking:

  1. With Brave Ads, we have some advertisers including query string params to help determine which traffic they receive via Brave Ads. Given that we hide behind the Chrome UA, there are few ways in which advertisers and publishers can determine whether our reporting aligns with theirs, until we have an Apollo-phase source of truth.

  2. If a publisher has a 1p relationship with an auth'd user, and uses link decorations as a means of optimizing or customizing content that is presented for the user, or other services used in the website, removing the decorations may break intended 1p:1p engagement behavior.

Of course, not trying to talk anyone into not providing better tracking protection, but the above items came to mind and I want to check in here to see if they were being factored in for potential impact.

@pes10k
Copy link
Contributor

pes10k commented Jun 17, 2019

@lukemulks the suggestion is not to remove all query string params, just those used specifically for tracking purposes. The ones in the link above would be a good starting point, but the list could grow or shrink depending on our boldness, measurement results, etc. So the worry is less ?likes=shoes but more facebook_id=<something>, that sort of thing

FWIW, the Safari ITP approach is to block all query params set by known / labeled tracking domains. So in some senses more aggressive, some senses less.

So I think the suggestion would steer clear of the concerns you mentioned, and that if we interfered with the use cases you mentioned, that'd be in most (if not all) cases a bug. WDYT?

@maximbaz
Copy link

Relevant: https://github.com/jparise/chrome-utm-stripper

@tildelowengrimm tildelowengrimm added the priority/P4 Planned work. We expect to get to it "soon". label Jun 28, 2019
@tildelowengrimm tildelowengrimm added the feature/shields The overall Shields feature in Brave. label Jul 20, 2019
@tildelowengrimm tildelowengrimm added priority/P3 The next thing for us to work on. It'll ride the trains. and removed priority/P4 Planned work. We expect to get to it "soon". labels Jul 31, 2019
@fmarier fmarier self-assigned this Aug 15, 2019
@lukemulks
Copy link

lukemulks commented Aug 15, 2019

I'm so late in the game on this thread @snyderp, apologies; to answer your question, it sounds good to me. Thank you for addressing the concerns, and explaining the context clearly in your response.

fmarier added a commit to brave/brave-core that referenced this issue Aug 24, 2019
…browser#4239)

If a URL's query string includes one of the parameter names known
to track individual users, we remove them.

We essentially apply the following to the query string:

    s/&(fbclid|gclid|msclkid|mc_eid)=[^&]+//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+&//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+$//g

https://support.google.com/analytics/answer/7519794
https://stackoverflow.com/questions/52847475/what-is-fbclid-the-new-facebook-parameter
https://about.ads.microsoft.com/en-us/blog/post/january-2018/conversion-tracking-update-on-bing-ads
https://developer.mailchimp.com/documentation/mailchimp/guides/getting-started-with-ecommerce/#e-commerce-tracking-and-reports
fmarier added a commit to brave/brave-core that referenced this issue Sep 10, 2019
…browser#4239)

If a URL's query string includes one of the parameter names known
to track individual users, we remove them.

We essentially apply the following to the query string:

    s/&(fbclid|gclid|msclkid|mc_eid)=[^&]+//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+&//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+$//g

https://support.google.com/analytics/answer/7519794
https://stackoverflow.com/questions/52847475/what-is-fbclid-the-new-facebook-parameter
https://about.ads.microsoft.com/en-us/blog/post/january-2018/conversion-tracking-update-on-bing-ads
https://developer.mailchimp.com/documentation/mailchimp/guides/getting-started-with-ecommerce/#e-commerce-tracking-and-reports
fmarier added a commit to brave/brave-core that referenced this issue Sep 18, 2019
…browser#4239)

If a URL's query string includes one of the parameter names known
to track individual users, we remove them.

We essentially apply the following to the query string:

    s/&(fbclid|gclid|msclkid|mc_eid)=[^&]+//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+&//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+$//g

https://support.google.com/analytics/answer/7519794
https://stackoverflow.com/questions/52847475/what-is-fbclid-the-new-facebook-parameter
https://about.ads.microsoft.com/en-us/blog/post/january-2018/conversion-tracking-update-on-bing-ads
https://developer.mailchimp.com/documentation/mailchimp/guides/getting-started-with-ecommerce/#e-commerce-tracking-and-reports
fmarier added a commit to brave/brave-core that referenced this issue Sep 23, 2019
…browser#4239)

If a URL's query string includes one of the parameter names known
to track individual users, we remove them.

We essentially apply the following to the query string:

    s/&(fbclid|gclid|msclkid|mc_eid)=[^&]+//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+&//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+$//g

https://support.google.com/analytics/answer/7519794
https://stackoverflow.com/questions/52847475/what-is-fbclid-the-new-facebook-parameter
https://about.ads.microsoft.com/en-us/blog/post/january-2018/conversion-tracking-update-on-bing-ads
https://developer.mailchimp.com/documentation/mailchimp/guides/getting-started-with-ecommerce/#e-commerce-tracking-and-reports
@Vagmer
Copy link

Vagmer commented Dec 17, 2019

This is an often-overlooked form of tracking, so good job deciding to add this to the browser!
Though, from what I can tell (please correct me if I'm wrong), the implementation you've went with is currently extremely narrow in scope - whereas this Issue at least appears to have been intended to be general in purpose (but has been closed with the posting of the mentioned narrow implementation), and the tiny description of this feature in the release notes communicates a general, even potentially comprehensive solution, as well. An accurate description would mention that only a select few query parameters (gclid, fbclid, msclkid and mc_eid) are handled, out of the many other ones known to be used for tracking that are commonly used across the web.

At any case, if you wish to actually implement a solution for the type of tracking in this Issue's title for real, as was alluded to in this thread, many solutions exist that are comprehensive (for example, the ClearURLs extension for Chrome/Firefox, and their code or lists of used parameter filters are publicly viewable.

@pes10k
Copy link
Contributor

pes10k commented Dec 17, 2019

@Vagmer gotta crawl before you walk ;) We're addressing what seem to be the heaviest hitters now, and can scale up as we gain confidence we're not busting things for users.

That additional set of tracking-related query parameters looks very interesting, thank you for linking! From eyeballing though, it looks like at least some may be used for purely 1p purposes, which we don't target. More generally though, this list seems to address a site tracking a user, once the user lands on that site (e.g. how a user got to amazon.com), when the bigger concern (from our end) is people using query parameters to track users across a large portion on the web (e.g. social embeds and similar getting known query params across all sites). Do you know if there is a similar, expanded list that targets that second problem?

@Vagmer
Copy link

Vagmer commented Dec 17, 2019

@snyderp:

gotta crawl before you walk ;) We're addressing what seem to be the heaviest hitters now, and can scale up as we gain confidence we're not busting things for users.

Oh, definitely makes sense. I can understand and agree with that approach, it just struck me that both the immediate closure of this issue and the (inaccurate) inclusion of this as a general feature in the release notes seem to signal that this was considered done with.

That additional set of tracking-related query parameters looks very interesting, thank you for linking! From eyeballing though, it looks like at least some may be used for purely 1p purposes, which we don't target. [...]

That extension and its rules are expansive and they fulfill more than a singular purpose that fits under cleaning URLs, so that wouldn't be surprising... It strips various tracking parameters, other "junk" or extraneous parameters, even skips intermediate redirection URLs/pages, etc... It also endeavors to include exclusions or otherwise shape rules to avoid the rare associated breakage. Personally, I've faced no issues with it, though occasionally such breakages are fixed after user reports.

Do you know if there is a similar, expanded list that targets that second problem?

That list includes the ubiquitous ones as well (such as utm_* parameters). Unfortunately, I don't know of a specialized or more descriptive list. Maybe the dev of that extension or its repo hold one. I know that there are many many more extensions (or userscripts) with the exact same purpose (there's an incomplete listing on ClearURLs's wiki, and elsewhere), though. The one I mentioned just seems to be the most extensive and advanced one that I'd come across.

@Madis0
Copy link

Madis0 commented Jan 6, 2020

Is this configurable by Shields or enabled for everyone?

@bsclifton
Copy link
Member

bsclifton commented Jan 6, 2020

@Madis0 should be fixed for everyone 👍 No shields configuration needed
cc: @fmarier

@Bonemeijer
Copy link

FWIW I think this behaviour should be disabled when shields are down for a site.

@fmarier
Copy link
Member Author

fmarier commented Apr 23, 2020

FWIW I think this behaviour should be disabled when shields are down for a site.

@Bonemeijer Have you found any breakage related to this?

"Shields down" is an webcompat-related toggle and I'm not aware of any compatibility problems with this protection.

@Madis0
Copy link

Madis0 commented Apr 23, 2020

If not anything else, it could confuse web developers using Brave.

@Bonemeijer
Copy link

Bonemeijer commented Apr 23, 2020

@fmarier I noticed that for sites which I'm working on, Brave removes gclid parameter from querystrings. Which I think is good. However, this behaviour persists even when shields are down for that site. This also happens when the url is entered manually - so the request does not originate from an shields-up location.

You can try it yourself by

  • opening any website, ie. google.com
  • click the brave icon and choose "shields down"
  • append the following querystring to the url ?gclid=1
  • and notice how the gclid parameter disappears

Now, this might be expected behaviour according to how it is programmed. But as an end-user, I would expect that the "shields down" functionality for a location would halt any blocking that might be done for that specific location. As an end-user who is also a webdeveloper I might even expect Brave sending it's own user-agent string.

@fmarier
Copy link
Member Author

fmarier commented Apr 23, 2020

Not all of Brave's protections can be disabled via Shields. If we determine that a protection doesn't have any negative impact on our users, we don't necessarily provide a toggle. I can see how it can be surprising for developers who aren't expecting this behavior.

Tying this feature to the toggle is certainly something we would consider for this feature if we discovered problems affecting our users.

@Bonemeijer
Copy link

Sounds fair enough.

Without knowing the full philosophy and background of the Brave project, as an end user I would expect that "shields down" means "I trust this site, allow them to show ads and gather statistics". And I would expect any alterations to the url or querystring would be included in that.

Now I know of the behaviour, I know I have to work around it by using another browser. But it did have me chasing my own tail for a minute.

@jdahdah
Copy link

jdahdah commented Dec 16, 2020

Is there no way to disable this? I'm trying out Brave as a replacement to Chrome for my web development work, but just ran into the issue of gclid disappearing despite turning off all protections. Unfortunately this is a blocker for my use-case.

@fmarier
Copy link
Member Author

fmarier commented Dec 16, 2020

@jdahdah Can you describe your use-case?

@jdahdah
Copy link

jdahdah commented Dec 17, 2020

@fmarier Being able to read the gclid from the URL via Javascript as long as Shields are down, and being able to fail gracefully when they're back up. As a web developer, I need to know Brave can be a full replacement for Chrome as I would like to get away from Google products entirely. I'm 100% for blocking all of this stuff when Shields are up, and I think this feature is fantastic, but I also need to be able to trust that the browser behaves like a base Chromium browser when I ask it to, otherwise that leaves me no choice but to reinstall Chrome, as I have had to do for this instance.

@fmarier
Copy link
Member Author

fmarier commented Dec 17, 2020

Thanks for expanding on your use case. I've filed #13242 to track this. The fact that it's been requested more than once suggests that many more developers are likely to want this too.

@dmilin1
Copy link

dmilin1 commented Feb 4, 2021

@fmarier I'd like to second @jdahdah's comment. Just spent 2 hours trying to figure out where the gclid was going. I assumed it wasn't Brave's fault because the shield was disabled. It's a very unintuitive interaction as I'd expect Brave to function like Chrome when shields are off.

@Ritik262
Copy link

is possible to find to msclkid in bing ads

@fmarier
Copy link
Member Author

fmarier commented May 19, 2022

is possible to find to msclkid in bing ads

You'll still see a msclkid parameter in links because we don't rewrite links inside of pages, but if you click on the link, the msclkid parameter will be removed before the connection to the other server.

@Ritik262
Copy link

Any python api to find msclkid in bing ads

lyubomyr-shaydariv added a commit to lyubomyr-shaydariv/uu-webext that referenced this issue Jun 15, 2024
lyubomyr-shaydariv added a commit to lyubomyr-shaydariv/uu-webext that referenced this issue Jun 15, 2024
lyubomyr-shaydariv added a commit to lyubomyr-shaydariv/uu-webext that referenced this issue Jun 15, 2024
lyubomyr-shaydariv added a commit to lyubomyr-shaydariv/uu-webext that referenced this issue Jun 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/shields The overall Shields feature in Brave. priority/P3 The next thing for us to work on. It'll ride the trains. privacy/query-filter privacy/tracking Preventing sites from tracking users across the web QA Pass-Linux QA Pass-macOS QA Pass-Win64 QA/Yes release-notes/include
Projects
None yet
Development

Successfully merging a pull request may close this issue.