Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic link-tracking protections to Google sites, update Facebook content script, and factor out common code. #2016

Merged
merged 24 commits into from
Oct 3, 2018

Conversation

bcyphers
Copy link
Contributor

@bcyphers bcyphers commented May 11, 2018

This pull request will add basic anti-link-tracking measures for Google searches (www.google.com) as well as to links pasted into Google Hangouts (hangouts.google.com).

The search script removes click and mousedown event listeners from vanilla search links on www.google.com. It doesn't apply to localized domains, or to any google sites other than web search. For some reason, link tracking happens differently in Firefox and Chrome. In Firefox, there is an event listener for mousedown that sneaks in and replaces the actual url with a www.google.com/url? redirect link. In Chrome, Google uses the new HTML5 ping attribute, which I just learned about. As an aside, it might be worth blocking this attribute everywhere -- I can't think of a valid non-tracking use case.

The hangouts script is simpler because the tracking mechanism is simpler. It uses setInterval to rewrite all hrefs that start with google.com/url? in hangouts.google.com iframes.

I also used this as an opportunity to abstract out the common functions that we use for link unwrapping into a lib/ file, to DRY out the code and make it easier to add anti-link-tracking scripts for other sites in the future.

Follows up on #1993 and #1392. Closes #85.

Closes #2012, closes #2014, closes #2018.

@bcyphers bcyphers changed the title Add basic anti-tracking to google.com Add basic link-tracking protections to Google sites May 17, 2018
@bcyphers bcyphers requested a review from ghostwords May 18, 2018 21:07
Copy link
Member

@ghostwords ghostwords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Got some feedback.

Could do YouTube next (href=https://www.youtube.com/redirect?...).

@@ -10,8 +10,7 @@
"incognito": "spanning",
"permissions": [
"tabs",
"http://*/*",
"https://*/*",
"*://*/*",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds WebSocket URLs (#1010). Let's hold off making this change here.

}
}

// unwrap wrapped links in a Hangouts iframe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should get updated now that it's not in Hangouts-specific code.

return;
}
for (let node of mutation.addedNodes) {
node.querySelectorAll(selector).forEach((element) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should fix exceptions with non-Element nodes (#2018).

return;
}

// remove all attributes from a link except for href
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for "target"?

a.removeAttribute(attr.name);
}
}
a.rel = "noreferrer";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do "noreferrer noopener" here since there is no target=_blank?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My takeaway from https://mathiasbynens.github.io/rel-noopener/ was that noreferrer makes noopener unnecessary, but on second glace I'm not sure. I'll add both for now.

a.rel = "noreferrer";

// block event listeners on the link
a.addEventListener("click", function (e) { e.stopImmediatePropagation(); }, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also call e.stopPropagation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think stopImmediatePropagation is a strict superset: https://dom.spec.whatwg.org/#dom-event-stopimmediatepropagation

@@ -35,8 +19,7 @@ function cleanLink(a) {
let href = new URL(a.href).searchParams.get('u');

// from https://stackoverflow.com/questions/3809401/what-is-a-good-regular-expression-to-match-a-url
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move the comment too.

@@ -51,29 +34,11 @@ function cleanLink(a) {
a.addEventListener("mouseover", function (e) { e.stopPropagation(); }, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also call e.stopImmediatePropagation?

Copy link
Contributor Author

@bcyphers bcyphers May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. See #2033

@ghostwords
Copy link
Member

Re blocking pings everywhere: related to but not the same as #2024.

Copy link
Member

@ghostwords ghostwords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the shared content script to something like src/js/firstparties/lib/utils.js?

@@ -0,0 +1,26 @@
/* globals findInAllFrames:false */
(function() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to wrap our content scripts in self-executing functions (since content scripts operate in their own JavaScript context), right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't realize they had their own contexts. Thanks for the heads up.


// block event listeners on the link
a.addEventListener("click", function (e) { e.stopImmediatePropagation(); }, true);
a.addEventListener("mousedown", function (e) { e.stopImmediatePropagation(); }, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see a gen_204 request go out with a plain click on a search result link, at least in Chrome.

@bcyphers
Copy link
Contributor Author

bcyphers commented May 23, 2018

Re: google.com in Chrome still sending out gen_204 requests (I can't find the comment anymore)

I can reproduce this, but I think it triggers whenever the user navigates away from the page, not just when they click on a link. I have a dev tools network tab open for google.com in Chrome, and every time I switch from Google to a different tab, a gen_204 POST request fires off, even if I haven't clicked any links on the page. I think it's outside the scope of this PR to stop those.

@bcyphers
Copy link
Contributor Author

bcyphers commented Jun 5, 2018

I replaced the last "address PR comments" commit with a few more descriptive ones. Also, I believe this closes #85, at least for everything described in the original issue.

@bcyphers
Copy link
Contributor Author

bcyphers commented Jun 5, 2018

@ghostwords I moved the first party util file like you suggested, and incorporated the changes from #2033 into this branch. Can you give this another look when you get a chance? Thanks!

@bcyphers bcyphers changed the title Add basic link-tracking protections to Google sites Add basic link-tracking protections to Google sites, update Facebook content script, and factor out common code. Jun 5, 2018
@bcyphers bcyphers requested a review from ghostwords June 5, 2018 19:02
Copy link
Member

@ghostwords ghostwords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge or rebase on master.

bcyphers added 8 commits July 23, 2018 17:01
Remove click and mousedown event listeners from vanilla search links on
google.com. Does not apply to localized domains, or to any google sites
other than web search.
Add lib/firstparty.js for commonly-used first-party script methods.
Update the google.com first-party script to use lib functions.
Create firstparties/hangouts.js, which unwraps shim links in google
hangouts chat windows at hangouts.google.com and mail.google.com.
Use lib/firstparty.js common functions in the facebook link-unwrapping
script. Update manifest.json to include the hangouts script, and change
"(http|https)://" url matches in the manifest to "*://" matches.
Google Hangouts content script now works correctly. Make some changes to
lib/firstparty.js, and change manifest to include hangouts script in all
hangouts.google.com iframes.
Since www.google.com uses the `ping` attribute for tracking instead of
the instrumentation in firefox, add that to the content script's
querySelector. Tested to work in both Firefox and Chrome.
Replace '*://*/*' permission with 'http://' and 'https://'
permissions again to avoid including WebSocket URLs for now.
bcyphers added 6 commits July 23, 2018 17:03
Check whether a node is an element node before trying to call
querySelectorAll on it. This avoids throwing confusing errors.
Add noopener tag to unwrapped google links since there is no
target=_blank. Move content script code out of anonymous self-executing
functions because they do not run in the global scope.
Move lib/firstparty.js to js/firstparties/lib/utils.js and change
references in manifest.json.
Add support for facebookcorewwwi.onion to facbook content script. Remove
unnecessary anonymous self-calling function from the script.
Remove URL_REGEX checks that were vulnerable to javascript injection and
replace with calls to isURL, which checks that a string starts with
http:// or https://.
@bcyphers bcyphers force-pushed the google.com-link-unwrapping branch from c6a7e55 to e03f5eb Compare July 24, 2018 00:17
Conflicts:
	src/js/firstparties/facebook.js
Copy link
Member

@ghostwords ghostwords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some manifest comments for now.

Is either test failure related to changes in this PR? I might have seen the cookie tracking failure before, but the QUnit failure seems new.

"https://*/*",
"http://*/*",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid making any unnecessary manifest changes. This seems trivial, but who knows what buggy logic this array reordering may trigger.

@@ -49,22 +49,42 @@
"js/firstparties/twitter.js"
],
"matches": [
"https://twitter.com/*",
"http://twitter.com/*"
"*://twitter.com/*"
Copy link
Member

@ghostwords ghostwords Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revert and go back to explicitly listing out https and http. "*://" is confusing as it includes ws(s) in some browsers but not others.

"js/firstparties/facebook.js"
],
"matches": [
"https://*.facebook.com/*",
"http://*.facebook.com/*",
"*://*.facebook.com/*",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above; revert please.

"js/firstparties/google.js"
],
"matches": [
"*://www.google.com/*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http/https instead

"js/firstparties/hangouts.js"
],
"matches": [
"*://hangouts.google.com/*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http/https instead

@ghostwords
Copy link
Member

Also, I merged in master, resolving conflicts with/applying the changes from #2163.

Rename the hangouts.js first party script to google-static.js and use it
to unwrap links on docs.google.com. Rename the web search unwrap script
from google.js to google-search.js.
Add tests for static google shim unwrapping and "trap link"
de-instrumentation. Update variable names in google scripts to avoid
conflicts.
Sometimes a content script would load before the utility script had
finished executing, which led to it trying to access undefined
functions. Now the tests load content scripts after utility scripts are
finished.
Copy link
Member

@ghostwords ghostwords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possibly unintended change this PR introduces is that now Google search result destinations will no longer receive any referrer information. When Privacy Badger is disabled, destination pages receive the origin (https://www.google.com/) as the referrer (in document.referrer and the referer header).

@ghostwords ghostwords merged commit c059a55 into master Oct 3, 2018
ghostwords added a commit that referenced this pull request Oct 3, 2018
Add protection against outgoing link click tracking on Google Search,
Google Docs and Google Hangouts.
@ghostwords ghostwords deleted the google.com-link-unwrapping branch October 3, 2018 18:24
@ghostwords ghostwords added the first-party relating to first-party scripts label Nov 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
first-party relating to first-party scripts
Projects
None yet
2 participants