Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop recording non-tracking domains #1795

Merged
merged 7 commits into from
Jan 21, 2018

Conversation

ghostwords
Copy link
Member

@ghostwords ghostwords commented Dec 6, 2017

Fixes #1446. Fixes #1405 by virtue of no longer checking DNT for non-tracking domains.

  • Stop checking DNT for non-tracking domains.
  • Add test to verify that non-tracking domains do not go into action_map.
  • Stop checking DNT for user-blocked domains.
  • Start checking DNT for domains seen tracking via localStorage or canvas fingerprinting.
  • Upon reverting control of user-controlled domains, check if the domain's action_map entry is set to all defaults (meaning no tracking recorded), and if it is, remove it.
  • Add migration to remove non-tracking domains on upgrade and on data import. (Preserve user-controlled non-tracking domains.)
  • See whether we can remove the unlimitedStorage permission now, or we have to do another release first (to get non-tracking domains removed before we remove unlimitedStorage).
  • Review performance impact.

Follows up on #1642 in terms of continuing to evolve our DNT checking approach.

@ghostwords ghostwords added DNT policy EFF's Do Not Track policy: www.eff.org/dnt-policy migrations Badger user data modifications performance labels Dec 7, 2017
@ghostwords ghostwords force-pushed the stop-recording-nontracking-domains branch 3 times, most recently from 3491575 to 762959a Compare December 8, 2017 22:32
@ghostwords ghostwords force-pushed the stop-recording-nontracking-domains branch from 762959a to 12fd302 Compare January 4, 2018 00:00
Copy link
Contributor

@alexristich alexristich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. To clarify, the addition of the skip_dnt_check is what is preventing non-tracking domains from remaining in action_map should someone import from an older version where non-tracking domains are present, correct?

@@ -121,10 +121,6 @@ HeuristicBlocker.prototype = {
return {};
}

window.setTimeout(function () {
badger.checkForDNTPolicy(fqdn);
}, 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the refactoring of the checkForDNTPolicy block, the function signature here should be updated accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish GitHub allowed me to add a comment to a line of code that wasn't directly modified by a pull request :P

The comment describing this function mentions that it sets a timeout to check for DNT policies; given this change, that aspect of the comment is no longer directly relevant for this function so it should be removed. Hopefully that makes sense!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, got it now, thanks!

*/
_recordPrevalence: function (tracker_fqdn, tracker_origin, page_origin) {
_recordPrevalence: function (tracker_fqdn, tracker_origin, page_origin, skip_dnt_check) {
var snitch_map = this.storage.getBadgerStorageObject('snitch_map');
var firstParties = [];
if (snitch_map.hasItem(tracker_origin)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change snitch_map to snitchMap to match naming convention used in newly added migration, etc.

@@ -323,10 +323,18 @@ BadgerPen.prototype = {

/**
* Remove user set action from a domain
* @param domain FQDN string
**/
* @param domain FQDN string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to @param {String} domain format.

@ghostwords
Copy link
Member Author

ghostwords commented Jan 10, 2018

skip_dnt_check (added in 12fd302) is there to avoid rechecking DNT policy for every tracking domain when importing user data. We moved where DNT checking is triggered to _recordPrevalence() in 2455de0, and _recordPrevalence() gets called when importing user data. It doesn't seem desirable to recheck all tracking domains at that time.

@ghostwords
Copy link
Member Author

Non-tracking domains are explicitly removed from user data imports by re-running the migration: 611ad70#diff-46ee54a6a841ec40759bc995200e52d2R183.

Instead of checking most* third-party domains.

This avoids creating action_map entries for non-tracking domains.

[*] This also checks DNT for tracking domains found through
localStorage or canvas fingerprinting. Previous logic checked
cookie-based third-party domains only.
Saves extra work, as DNT compliance does not override user preference.
@ghostwords ghostwords force-pushed the stop-recording-nontracking-domains branch from 12fd302 to 1c0a3d2 Compare January 18, 2018 21:13
@ghostwords ghostwords dismissed alexristich’s stale review January 18, 2018 21:17

Rebased with master, implemented suggestions.

Copy link
Contributor

@alexristich alexristich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor clarification comment on return type, otherwise looks good!

@@ -395,7 +403,7 @@ BadgerStorage.prototype = {
* Get an item
*
* @param {String} key - the key for the item
* @return the value for that key or null
* @return {?*} the value for that key or null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Clarification] Just as an aside, this return type is unfamiliar to me; can you clarify what this means?

Copy link
Member Author

@ghostwords ghostwords Jan 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mixing the ? prefix, which signifies something could be null (see the JSDoc doc. page for @type), and this SO answer for how to specify a type that could be anything. This is probably not actually supported ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, fair enough! I guess right now we only return {?string}, but absolutely true there's no reason why this couldn't change at some point. Thanks for the explanation :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BadgerStorage.prototype.getItem can return any type:

//"object"
typeof badger.storage.getBadgerStorageObject('action_map').getItem('outbrain.com')
//"boolean"
typeof badger.getSettings().getItem("seenComic")

@ghostwords
Copy link
Member Author

ghostwords commented Jan 19, 2018

I might have written or said at some point that this PR should have no effect on the list of tracking domains on the options page. I was thinking since the list is of tracking domains, and we stop recording (& remove existing) non-tracking domains, there will be no change to the list.

In reality, no, the list is (almost certainly) going to get smaller. We currently display non-tracking subdomains of tracking domains in the list (because subdomains inherit tracking status from parent domains). After this PR, we will no longer display those subdomains (since we will no longer record them).

There shouldn't be any change to what Privacy Badger blocks or doesn't block on pages you visit.

@ghostwords ghostwords merged commit 1c0a3d2 into master Jan 21, 2018
ghostwords added a commit that referenced this pull request Jan 21, 2018
Stop recording non-tracking domains.
@ghostwords ghostwords deleted the stop-recording-nontracking-domains branch January 21, 2018 16:32
@ghostwords
Copy link
Member Author

@terrorist96 Once your Privacy Badger updates to 2018.1.25, could you run a couple of debugging queries to see where your Badger is at post upgrade? Let's see how much space your Badger uses and what your most populous domains are at this point.

@terrorist96
Copy link
Contributor

chrome.storage.local.get(null, r=>console.log(
  parseFloat(JSON.stringify(r).length / 1024 / 1024).toFixed(2), "MB"));

0.99 MB
omitting list of domains:

    "base": "disqus.com",
    "count": 309,
    "base": "googlevideo.com",
    "count": 150,
    "base": "google.com",
    "count": 52,
    "base": "akamaihd.net",
    "count": 38,
    "base": "wordpress.com",
    "count": 28,
    "base": "openx.net",
    "count": 15,
    "base": "btrll.com",
    "count": 14,
    "base": "googleusercontent.com",
    "count": 13,
    "base": "adnxs.com",
    "count": 13,
    "base": "doubleclick.net",
    "count": 13,

@ghostwords
Copy link
Member Author

ghostwords commented Jan 25, 2018

This looks good to me. Your storage use decreased at least fivefold, plus the 27K+ pubnub.com subdomains (#1318 (comment), #1717 (comment)) are gone. We should be good to remove the "unlimitedStorage" permission sometime next week.

@terrorist96
Copy link
Contributor

But didn't you say the limit is 2MB without the unlimitedStorage permission? Being at the precipice of 1MB is not that reassuring, since it will only grow larger over time and eventually hit 2MB again.

@ghostwords
Copy link
Member Author

ghostwords commented Jan 25, 2018

Extension storage (chrome.storage.local) is limited to ~5 MB by default.

My intuition is that Privacy Badger learning is front-loaded with a long tail. I expect it would take a long time for your Badger to get to 2 MB, but I could be wrong! I wonder what's a good (simple) early-warning system we could put in place to help prevent Badger users from running out of space in the future.

@terrorist96
Copy link
Contributor

terrorist96 commented Jan 25, 2018

5 is better than 2 and gives me more peace of mind. Though I wish there was some option in between 5 and unlimited.

@ghostwords
Copy link
Member Author

ghostwords commented Feb 12, 2018

Picking up from #1795 (comment), it looks like we are still recording non-tracking subdomains of tracking domains. This should be fewer non-tracking domains than before, but still, we shouldn't store any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DNT policy EFF's Do Not Track policy: www.eff.org/dnt-policy migrations Badger user data modifications performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants